Experiments | Opik Documentation

Experiments in Opik allow you to link traces (execution data) with dataset items to create a foundation for evaluation and comparison. Experiments let you track, analyze, and compare the performance of your LLM applications across different versions, models, or configurations.

What are Experiments?

An experiment in Opik connects traces (records of LLM executions) with dataset items, creating a linkage that enables structured evaluation and analysis. This connection allows you to:

Compare different LLM implementations against the same dataset
Evaluate model performance with various metrics
Track improvements or regressions over time
Analyze feedback scores across different versions

Creating and Managing Experiments

The TypeScript SDK provides several methods to create and manage experiments through the OpikClient class.

Getting Experiments

1 // Get all experiments with a specific name
2 const experiments = await opik.getExperimentsByName("my-experiment");
3 
4 // Get a single experiment by name (first match if multiple exist)
5 const experiment = await opik.getExperiment("my-experiment");
6 
7 // Get all experiments associated with a dataset
8 const datasetExperiments = await opik.getDatasetExperiments("my-dataset", 100); // Optional maxResults parameter (default: 100)

Deleting an Experiment

1 // Delete an experiment by ID
2 await opik.deleteExperiment("experiment-id");

Working with Experiment Items

Experiment items are the core components that link dataset items with traces. These connections enable you to analyze how your LLM application performs on specific inputs.

Creating Experiment Items

1 import { ExperimentItemReferences } from "opik";
2 
3 // Get an existing experiment
4 const experiment = await opik.getExperiment("my-experiment");
5 
6 // Create references between dataset items and traces
7 const experimentItems = [
8   new ExperimentItemReferences({
9     datasetItemId: "dataset-item-1",
10     traceId: "trace-id-1",
11   }),
12   new ExperimentItemReferences({
13     datasetItemId: "dataset-item-2",
14     traceId: "trace-id-2",
15   }),
16 ];
17 
18 // Insert the experiment items
19 await experiment.insert(experimentItems);

Retrieving Experiment Items

1 // Get all items in an experiment
2 const allItems = await experiment.getItems();
3 
4 // Get a limited number of items
5 const limitedItems = await experiment.getItems({ maxResults: 50 });
6 
7 // Get items with truncated data (improves performance for large datasets)
8 const truncatedItems = await experiment.getItems({ truncate: true });

Experiment URL

Get the URL to view the experiment in the Opik web interface:

1 const url = await experiment.getUrl();
2 console.log(`View experiment at: ${url}`);

Data Structures

Experiment

Represents an experiment in Opik that connects traces with dataset items:

1 class Experiment {
2   readonly id: string; // Unique identifier of the experiment
3   readonly name?: string; // Optional name of the experiment
4   readonly datasetName: string; // Name of the dataset associated with the experiment
5 
6   // Creates new experiment items by linking traces with dataset items
7   insert(experimentItemReferences: ExperimentItemReferences[]): Promise<void>;
8 
9   // Retrieves experiment items with options for pagination and data truncation
10   getItems(options?: {
11     maxResults?: number; // Maximum number of items to retrieve
12     truncate?: boolean; // Whether to truncate large data fields
13   }): Promise<ExperimentItemContent[]>;
14 
15   // Gets the URL to view the experiment in the Opik web interface
16   getUrl(): Promise<string>;
17 }

ExperimentItemReferences

References connecting a dataset item to a trace:

1 interface ExperimentItemReferences {
2   readonly datasetItemId: string; // ID of the dataset item
3   readonly traceId: string; // ID of the trace
4 }

ExperimentItemContent

Content of an experiment item including evaluation data and feedback scores:

1 interface ExperimentItemContent {
2   readonly id?: string; // Experiment item ID
3   readonly datasetItemId: string; // Dataset item ID
4   readonly traceId: string; // Trace ID
5   readonly datasetItemData?: JsonListStringCompare; // Dataset item data
6   readonly evaluationTaskOutput?: JsonListStringCompare; // Evaluation task output
7   readonly feedbackScores: FeedbackScore[]; // Feedback scores for the item
8 }

FeedbackScore

Represents a feedback score for an experiment item:

1 interface FeedbackScore {
2   categoryName: string; // Category of the feedback
3   name: string; // Name of the feedback metric
4   reason?: string; // Optional reason for the score
5   value: number; // Score value
6   source: string; // Source of the feedback
7 }