In Opik 2.0, datasets and experiments are project-scoped. Make sure to specify a project_name when creating datasets and running experiments so they are associated with the correct project.
Hugging Face Datasets is a library that provides easy access to thousands of datasets for machine learning and natural language processing tasks.
This guide explains how to integrate Opik with Hugging Face Datasets to convert and import datasets into Opik for model evaluation and optimization.
Comet provides a hosted version of the Opik platform, simply create an account and grab your API Key.
You can also run the Opik platform locally, see the installation guide for more information.
To use Hugging Face Datasets with Opik, you’ll need to have both the datasets and opik packages installed:
Configure the Opik Python SDK for your deployment type. See the Python SDK Configuration guide for detailed instructions on:
opik configureopik.configure()In order to access private datasets on Hugging Face, you will need to have your Hugging Face token. You can create and manage your Hugging Face tokens on this page.
You can set it as an environment variable:
Or set it programmatically:
The integration provides a utility class to convert Hugging Face datasets to Opik format:
Here’s how to convert a Hugging Face dataset to Opik format and upload it:
The converter provides a method to transform Hugging Face datasets into Opik’s expected format:
Use the @track decorator to create comprehensive traces when working with your converted datasets:
Once your Hugging Face datasets are converted and uploaded to Opik, you can view them in the Opik UI. Each dataset will contain:
Once your Hugging Face datasets are in Opik, you can evaluate your LLM applications using Opik’s evaluation framework:
Make sure to set the following environment variables:
subset_size parameter to limit large datasetsOnce you have Hugging Face Datasets integrated with Opik, you can: