Observability for Hugging Face Datasets with Opik

Hugging Face Datasets is a library that provides easy access to thousands of datasets for machine learning and natural language processing tasks.

This guide explains how to integrate Opik with Hugging Face Datasets to convert and import datasets into Opik for model evaluation and optimization.

Account Setup

Comet provides a hosted version of the Opik platform, simply create an account and grab your API Key.

You can also run the Opik platform locally, see the installation guide for more information.

Getting Started

Installation

To use Hugging Face Datasets with Opik, you’ll need to have both the datasets and opik packages installed:

$ pip install opik datasets transformers pandas tqdm huggingface_hub

Configuring Opik

Configure the Opik Python SDK for your deployment type. See the Python SDK Configuration guide for detailed instructions on:

CLI configuration: opik configure
Code configuration: opik.configure()
Self-hosted vs Cloud vs Enterprise setup
Configuration files and environment variables

Configuring Hugging Face

In order to access private datasets on Hugging Face, you will need to have your Hugging Face token. You can create and manage your Hugging Face tokens on this page.

You can set it as an environment variable:

$ export HUGGINGFACE_HUB_TOKEN="YOUR_TOKEN"

Or set it programmatically:

1 import os
2 import getpass
3 
4 if "HUGGINGFACE_HUB_TOKEN" not in os.environ:
5     os.environ["HUGGINGFACE_HUB_TOKEN"] = getpass.getpass("Enter your Hugging Face token: ")
6 
7 # Set project name for organization
8 os.environ["OPIK_PROJECT_NAME"] = "huggingface-datasets-integration-demo"

HuggingFaceToOpikConverter

The integration provides a utility class to convert Hugging Face datasets to Opik format:

1 from datasets import load_dataset, Dataset as HFDataset
2 from opik import Opik
3 from typing import Optional, Dict, Any, List
4 import json
5 from tqdm import tqdm
6 import warnings
7 import numpy as np
8 import pandas as pd
9 
10 warnings.filterwarnings('ignore')
11 
12 class HuggingFaceToOpikConverter:
13     """Utility class to convert Hugging Face datasets to Opik format."""
14 
15     def __init__(self, opik_client: Opik):
16         self.opik_client = opik_client
17 
18     def load_hf_dataset(
19         self,
20         dataset_name: str,
21         split: Optional[str] = None,
22         config: Optional[str] = None,
23         subset_size: Optional[int] = None,
24         **kwargs
25     ) -> HFDataset:
26         """
27         Load a dataset from Hugging Face Hub.
28 
29         Args:
30             dataset_name: Name of the dataset on HF Hub
31             split: Specific split to load (train, validation, test)
32             config: Configuration/subset of the dataset
33             subset_size: Limit the number of samples
34             **kwargs: Additional arguments for load_dataset
35 
36         Returns:
37             Loaded Hugging Face dataset
38         """
39         print(f"📥 Loading dataset: {dataset_name}")
40         if config:
41             print(f"   Config: {config}")
42         if split:
43             print(f"   Split: {split}")
44 
45         # Load the dataset
46         dataset = load_dataset(
47             dataset_name,
48             name=config,
49             split=split,
50             **kwargs
51         )
52 
53         # Limit dataset size if specified
54         if subset_size and len(dataset) > subset_size:
55             dataset = dataset.select(range(subset_size))
56             print(f"   Limited to {subset_size} samples")
57 
58         print(f"   ✅ Loaded {len(dataset)} samples")
59         print(f"   Features: {list(dataset.features.keys())}")
60 
61         return dataset

Basic Usage

Convert and Upload a Dataset

Here’s how to convert a Hugging Face dataset to Opik format and upload it:

1 # Initialize the converter
2 opik_client = Opik()
3 converter = HuggingFaceToOpikConverter(opik_client)
4 
5 # Load a dataset from Hugging Face
6 dataset = converter.load_hf_dataset(
7     dataset_name="squad",
8     split="validation",
9     subset_size=100  # Limit for demo
10 )
11 
12 # Convert to Opik format
13 opik_data = converter.convert_to_opik_format(
14     dataset=dataset,
15     input_columns=["question"],
16     output_columns=["answers"],
17     metadata_columns=["id", "title"],
18     dataset_name="squad-qa-dataset",
19     description="SQuAD question answering dataset converted from Hugging Face"
20 )
21 
22 print(f"✅ Converted {len(opik_data)} items to Opik format!")

Convert to Opik Format

The converter provides a method to transform Hugging Face datasets into Opik’s expected format:

1 def convert_to_opik_format(
2     self,
3     dataset: HFDataset,
4     input_columns: List[str],
5     output_columns: List[str],
6     metadata_columns: Optional[List[str]] = None,
7     dataset_name: str = "huggingface-dataset",
8     description: str = "Dataset converted from Hugging Face"
9 ) -> List[Dict[str, Any]]:
10     """
11     Convert a Hugging Face dataset to Opik format.
12 
13     Args:
14         dataset: Hugging Face dataset
15         input_columns: List of column names to use as input
16         output_columns: List of column names to use as expected output
17         metadata_columns: Optional list of columns to include as metadata
18         dataset_name: Name for the Opik dataset
19         description: Description for the Opik dataset
20 
21     Returns:
22         List of Opik dataset items
23     """
24     opik_items = []
25 
26     for row in tqdm(dataset, desc="Converting to Opik format"):
27         # Extract input data
28         input_data = {}
29         for col in input_columns:
30             if col in dataset.features:
31                 input_data[col] = self._extract_field_value(row, col)
32 
33         # Extract expected output
34         expected_output = {}
35         for col in output_columns:
36             if col in dataset.features:
37                 expected_output[col] = self._extract_field_value(row, col)
38 
39         # Extract metadata
40         metadata = {}
41         if metadata_columns:
42             for col in metadata_columns:
43                 if col in dataset.features:
44                     metadata[col] = self._extract_field_value(row, col)
45 
46         # Create Opik dataset item
47         item = {
48             "input": input_data,
49             "expected_output": expected_output,
50             "metadata": metadata
51         }
52         opik_items.append(item)
53 
54     return opik_items

Using with @track decorator

Use the @track decorator to create comprehensive traces when working with your converted datasets:

1 from opik import track
2 
3 @track
4 def evaluate_qa_model(dataset_item):
5     """Evaluate a Q&A model using Hugging Face dataset."""
6     question = dataset_item["input"]["question"]
7 
8     # Your model logic here (replace with actual model)
9     if 'what' in question.lower():
10         response = "This is a question asking for information."
11     elif 'how' in question.lower():
12         response = "This is a question asking for a process or method."
13     else:
14         response = "This is a general question that requires analysis."
15 
16     return {
17         "question": question,
18         "response": response,
19         "expected": dataset_item["expected_output"],
20         "metadata": dataset_item["metadata"]
21     }
22 
23 # Evaluate on your dataset
24 for item in opik_data[:5]:  # Evaluate first 5 items
25     result = evaluate_qa_model(item)
26     print(f"Question: {result['question'][:50]}...")

Popular Dataset Examples

SQuAD (Question Answering)

1 # Load SQuAD dataset
2 squad_dataset = converter.load_hf_dataset(
3     dataset_name="squad",
4     split="validation",
5     subset_size=50
6 )
7 
8 # Convert to Opik format
9 squad_opik = converter.convert_to_opik_format(
10     dataset=squad_dataset,
11     input_columns=["question"],
12     output_columns=["answers"],
13     metadata_columns=["id", "title"],
14     dataset_name="squad-qa-dataset",
15     description="SQuAD question answering dataset"
16 )

GLUE (General Language Understanding)

1 # Load GLUE SST-2 dataset
2 sst2_dataset = converter.load_hf_dataset(
3     dataset_name="glue",
4     config_name="sst2",
5     split="validation",
6     subset_size=100
7 )
8 
9 # Convert to Opik format
10 sst2_opik = converter.convert_to_opik_format(
11     dataset=sst2_dataset,
12     input_columns=["sentence"],
13     output_columns=["label"],
14     metadata_columns=["idx"],
15     dataset_name="sst2-sentiment-dataset",
16     description="SST-2 sentiment analysis dataset from GLUE"
17 )

Common Crawl (Text Classification)

1 # Load Common Crawl dataset
2 cc_dataset = converter.load_hf_dataset(
3     dataset_name="common_crawl",
4     subset_size=200
5 )
6 
7 # Convert to Opik format
8 cc_opik = converter.convert_to_opik_format(
9     dataset=cc_dataset,
10     input_columns=["text"],
11     output_columns=["language"],
12     metadata_columns=["url", "timestamp"],
13     dataset_name="common-crawl-dataset",
14     description="Common Crawl text classification dataset"
15 )

Results viewing

Once your Hugging Face datasets are converted and uploaded to Opik, you can view them in the Opik UI. Each dataset will contain:

Input data from specified columns
Expected output from specified columns
Metadata from additional columns
Source information (Hugging Face dataset name and split)

Feedback Scores and Evaluation

Once your Hugging Face datasets are in Opik, you can evaluate your LLM applications using Opik’s evaluation framework:

1 from opik.evaluation import evaluate
2 from opik.evaluation.metrics import Hallucination
3 
4 # Define your evaluation task
5 def evaluation_task(x):
6     return {
7         "message": x["input"]["question"],
8         "output": x["response"],
9         "reference": x["expected_output"]["answers"]
10     }
11 
12 # Create the Hallucination metric
13 hallucination_metric = Hallucination()
14 
15 # Run the evaluation
16 evaluation_results = evaluate(
17     experiment_name="huggingface-datasets-evaluation",
18     dataset=squad_opik,
19     task=evaluation_task,
20     scoring_metrics=[hallucination_metric],
21 )

Environment Variables

Make sure to set the following environment variables:

$ # Hugging Face Configuration (optional, for private datasets)
> export HUGGINGFACE_HUB_TOKEN="your-huggingface-token"
> 
> # Opik Configuration
> export OPIK_PROJECT_NAME="your-project-name"
> export OPIK_WORKSPACE="your-workspace-name"

Troubleshooting

Common Issues

Authentication Errors: Ensure your Hugging Face token is correct for private datasets
Dataset Not Found: Verify the dataset name and configuration are correct
Memory Issues: Use subset_size parameter to limit large datasets
Data Type Conversion: The converter handles most data types, but complex nested structures may need custom handling

Getting Help

Check the Hugging Face Datasets Documentation for dataset loading
Review the Hugging Face Hub Documentation for authentication
Contact Hugging Face support for dataset-specific problems
Check Opik documentation for tracing and evaluation features

Next Steps

Once you have Hugging Face Datasets integrated with Opik, you can:

Evaluate your LLM applications using Opik’s evaluation framework
Create datasets to test and improve your models
Set up feedback collection to gather human evaluations
Monitor performance across different models and configurations