Observability for Hugging Face Datasets with Opik

Hugging Face Datasets is a library that provides easy access to thousands of datasets for machine learning and natural language processing tasks.

This guide explains how to integrate Opik with Hugging Face Datasets to convert and import datasets into Opik for model evaluation and optimization.

Account Setup

Comet provides a hosted version of the Opik platform, simply create an account and grab your API Key.

You can also run the Opik platform locally, see the installation guide for more information.

Getting Started

Installation

To use Hugging Face Datasets with Opik, you’ll need to have both the datasets and opik packages installed:

$pip install opik datasets transformers pandas tqdm huggingface_hub

Configuring Opik

Configure the Opik Python SDK for your deployment type. See the Python SDK Configuration guide for detailed instructions on:

  • CLI configuration: opik configure
  • Code configuration: opik.configure()
  • Self-hosted vs Cloud vs Enterprise setup
  • Configuration files and environment variables

Configuring Hugging Face

In order to access private datasets on Hugging Face, you will need to have your Hugging Face token. You can create and manage your Hugging Face tokens on this page.

You can set it as an environment variable:

{pytest_codeblocks_skip=true}
$export HUGGINGFACE_HUB_TOKEN="YOUR_TOKEN"

Or set it programmatically:

1import os
2import getpass
3
4if "HUGGINGFACE_HUB_TOKEN" not in os.environ:
5 os.environ["HUGGINGFACE_HUB_TOKEN"] = getpass.getpass("Enter your Hugging Face token: ")
6
7# Set project name for organization
8os.environ["OPIK_PROJECT_NAME"] = "huggingface-datasets-integration-demo"

HuggingFaceToOpikConverter

The integration provides a utility class to convert Hugging Face datasets to Opik format:

1from datasets import load_dataset, Dataset as HFDataset
2from opik import Opik
3from typing import Optional, Dict, Any, List
4import json
5from tqdm import tqdm
6import warnings
7import numpy as np
8import pandas as pd
9
10warnings.filterwarnings('ignore')
11
12class HuggingFaceToOpikConverter:
13 """Utility class to convert Hugging Face datasets to Opik format."""
14
15 def __init__(self, opik_client: Opik):
16 self.opik_client = opik_client
17
18 def load_hf_dataset(
19 self,
20 dataset_name: str,
21 split: Optional[str] = None,
22 config: Optional[str] = None,
23 subset_size: Optional[int] = None,
24 **kwargs
25 ) -> HFDataset:
26 """
27 Load a dataset from Hugging Face Hub.
28
29 Args:
30 dataset_name: Name of the dataset on HF Hub
31 split: Specific split to load (train, validation, test)
32 config: Configuration/subset of the dataset
33 subset_size: Limit the number of samples
34 **kwargs: Additional arguments for load_dataset
35
36 Returns:
37 Loaded Hugging Face dataset
38 """
39 print(f"📥 Loading dataset: {dataset_name}")
40 if config:
41 print(f" Config: {config}")
42 if split:
43 print(f" Split: {split}")
44
45 # Load the dataset
46 dataset = load_dataset(
47 dataset_name,
48 name=config,
49 split=split,
50 **kwargs
51 )
52
53 # Limit dataset size if specified
54 if subset_size and len(dataset) > subset_size:
55 dataset = dataset.select(range(subset_size))
56 print(f" Limited to {subset_size} samples")
57
58 print(f" ✅ Loaded {len(dataset)} samples")
59 print(f" Features: {list(dataset.features.keys())}")
60
61 return dataset

Basic Usage

Convert and Upload a Dataset

Here’s how to convert a Hugging Face dataset to Opik format and upload it:

1# Initialize the converter
2opik_client = Opik()
3converter = HuggingFaceToOpikConverter(opik_client)
4
5# Load a dataset from Hugging Face
6dataset = converter.load_hf_dataset(
7 dataset_name="squad",
8 split="validation",
9 subset_size=100 # Limit for demo
10)
11
12# Convert to Opik format
13opik_data = converter.convert_to_opik_format(
14 dataset=dataset,
15 input_columns=["question"],
16 output_columns=["answers"],
17 metadata_columns=["id", "title"],
18 dataset_name="squad-qa-dataset",
19 description="SQuAD question answering dataset converted from Hugging Face"
20)
21
22print(f"✅ Converted {len(opik_data)} items to Opik format!")

Convert to Opik Format

The converter provides a method to transform Hugging Face datasets into Opik’s expected format:

1def convert_to_opik_format(
2 self,
3 dataset: HFDataset,
4 input_columns: List[str],
5 output_columns: List[str],
6 metadata_columns: Optional[List[str]] = None,
7 dataset_name: str = "huggingface-dataset",
8 description: str = "Dataset converted from Hugging Face"
9) -> List[Dict[str, Any]]:
10 """
11 Convert a Hugging Face dataset to Opik format.
12
13 Args:
14 dataset: Hugging Face dataset
15 input_columns: List of column names to use as input
16 output_columns: List of column names to use as expected output
17 metadata_columns: Optional list of columns to include as metadata
18 dataset_name: Name for the Opik dataset
19 description: Description for the Opik dataset
20
21 Returns:
22 List of Opik dataset items
23 """
24 opik_items = []
25
26 for row in tqdm(dataset, desc="Converting to Opik format"):
27 # Extract input data
28 input_data = {}
29 for col in input_columns:
30 if col in dataset.features:
31 input_data[col] = self._extract_field_value(row, col)
32
33 # Extract expected output
34 expected_output = {}
35 for col in output_columns:
36 if col in dataset.features:
37 expected_output[col] = self._extract_field_value(row, col)
38
39 # Extract metadata
40 metadata = {}
41 if metadata_columns:
42 for col in metadata_columns:
43 if col in dataset.features:
44 metadata[col] = self._extract_field_value(row, col)
45
46 # Create Opik dataset item
47 item = {
48 "input": input_data,
49 "expected_output": expected_output,
50 "metadata": metadata
51 }
52 opik_items.append(item)
53
54 return opik_items

Using with @track decorator

Use the @track decorator to create comprehensive traces when working with your converted datasets:

1from opik import track
2
3@track
4def evaluate_qa_model(dataset_item):
5 """Evaluate a Q&A model using Hugging Face dataset."""
6 question = dataset_item["input"]["question"]
7
8 # Your model logic here (replace with actual model)
9 if 'what' in question.lower():
10 response = "This is a question asking for information."
11 elif 'how' in question.lower():
12 response = "This is a question asking for a process or method."
13 else:
14 response = "This is a general question that requires analysis."
15
16 return {
17 "question": question,
18 "response": response,
19 "expected": dataset_item["expected_output"],
20 "metadata": dataset_item["metadata"]
21 }
22
23# Evaluate on your dataset
24for item in opik_data[:5]: # Evaluate first 5 items
25 result = evaluate_qa_model(item)
26 print(f"Question: {result['question'][:50]}...")

SQuAD (Question Answering)

1# Load SQuAD dataset
2squad_dataset = converter.load_hf_dataset(
3 dataset_name="squad",
4 split="validation",
5 subset_size=50
6)
7
8# Convert to Opik format
9squad_opik = converter.convert_to_opik_format(
10 dataset=squad_dataset,
11 input_columns=["question"],
12 output_columns=["answers"],
13 metadata_columns=["id", "title"],
14 dataset_name="squad-qa-dataset",
15 description="SQuAD question answering dataset"
16)

GLUE (General Language Understanding)

1# Load GLUE SST-2 dataset
2sst2_dataset = converter.load_hf_dataset(
3 dataset_name="glue",
4 config_name="sst2",
5 split="validation",
6 subset_size=100
7)
8
9# Convert to Opik format
10sst2_opik = converter.convert_to_opik_format(
11 dataset=sst2_dataset,
12 input_columns=["sentence"],
13 output_columns=["label"],
14 metadata_columns=["idx"],
15 dataset_name="sst2-sentiment-dataset",
16 description="SST-2 sentiment analysis dataset from GLUE"
17)

Common Crawl (Text Classification)

1# Load Common Crawl dataset
2cc_dataset = converter.load_hf_dataset(
3 dataset_name="common_crawl",
4 subset_size=200
5)
6
7# Convert to Opik format
8cc_opik = converter.convert_to_opik_format(
9 dataset=cc_dataset,
10 input_columns=["text"],
11 output_columns=["language"],
12 metadata_columns=["url", "timestamp"],
13 dataset_name="common-crawl-dataset",
14 description="Common Crawl text classification dataset"
15)

Results viewing

Once your Hugging Face datasets are converted and uploaded to Opik, you can view them in the Opik UI. Each dataset will contain:

  • Input data from specified columns
  • Expected output from specified columns
  • Metadata from additional columns
  • Source information (Hugging Face dataset name and split)

Feedback Scores and Evaluation

Once your Hugging Face datasets are in Opik, you can evaluate your LLM applications using Opik’s evaluation framework:

1from opik.evaluation import evaluate
2from opik.evaluation.metrics import Hallucination
3
4# Define your evaluation task
5def evaluation_task(x):
6 return {
7 "message": x["input"]["question"],
8 "output": x["response"],
9 "reference": x["expected_output"]["answers"]
10 }
11
12# Create the Hallucination metric
13hallucination_metric = Hallucination()
14
15# Run the evaluation
16evaluation_results = evaluate(
17 experiment_name="huggingface-datasets-evaluation",
18 dataset=squad_opik,
19 task=evaluation_task,
20 scoring_metrics=[hallucination_metric],
21)

Environment Variables

Make sure to set the following environment variables:

$# Hugging Face Configuration (optional, for private datasets)
>export HUGGINGFACE_HUB_TOKEN="your-huggingface-token"
>
># Opik Configuration
>export OPIK_PROJECT_NAME="your-project-name"
>export OPIK_WORKSPACE="your-workspace-name"

Troubleshooting

Common Issues

  1. Authentication Errors: Ensure your Hugging Face token is correct for private datasets
  2. Dataset Not Found: Verify the dataset name and configuration are correct
  3. Memory Issues: Use subset_size parameter to limit large datasets
  4. Data Type Conversion: The converter handles most data types, but complex nested structures may need custom handling

Getting Help

Next Steps

Once you have Hugging Face Datasets integrated with Opik, you can: