Experiments

Experiments in Opik allow you to link traces (execution data) with dataset items to create a foundation for evaluation and comparison. Experiments let you track, analyze, and compare the performance of your LLM applications across different versions, models, or configurations.

What are Experiments?

An experiment in Opik connects traces (records of LLM executions) with dataset items, creating a linkage that enables structured evaluation and analysis. This connection allows you to:

  • Compare different LLM implementations against the same dataset
  • Evaluate model performance with various metrics
  • Track improvements or regressions over time
  • Analyze feedback scores across different versions

Creating and Managing Experiments

The TypeScript SDK provides several methods to create and manage experiments through the OpikClient class.

Getting Experiments

1// Get all experiments with a specific name
2const experiments = await opik.getExperimentsByName("my-experiment");
3
4// Get a single experiment by name (first match if multiple exist)
5const experiment = await opik.getExperiment("my-experiment");
6
7// Get all experiments associated with a dataset
8const datasetExperiments = await opik.getDatasetExperiments("my-dataset", 100); // Optional maxResults parameter (default: 100)

Deleting an Experiment

1// Delete an experiment by ID
2await opik.deleteExperiment("experiment-id");

Working with Experiment Items

Experiment items are the core components that link dataset items with traces. These connections enable you to analyze how your LLM application performs on specific inputs.

Creating Experiment Items

1import { ExperimentItemReferences } from "opik";
2
3// Get an existing experiment
4const experiment = await opik.getExperiment("my-experiment");
5
6// Create references between dataset items and traces
7const experimentItems = [
8 new ExperimentItemReferences({
9 datasetItemId: "dataset-item-1",
10 traceId: "trace-id-1",
11 }),
12 new ExperimentItemReferences({
13 datasetItemId: "dataset-item-2",
14 traceId: "trace-id-2",
15 }),
16];
17
18// Insert the experiment items
19await experiment.insert(experimentItems);

Retrieving Experiment Items

1// Get all items in an experiment
2const allItems = await experiment.getItems();
3
4// Get a limited number of items
5const limitedItems = await experiment.getItems({ maxResults: 50 });
6
7// Get items with truncated data (improves performance for large datasets)
8const truncatedItems = await experiment.getItems({ truncate: true });

Experiment URL

Get the URL to view the experiment in the Opik web interface:

1const url = await experiment.getUrl();
2console.log(`View experiment at: ${url}`);

Data Structures

Experiment

Represents an experiment in Opik that connects traces with dataset items:

1class Experiment {
2 readonly id: string; // Unique identifier of the experiment
3 readonly name?: string; // Optional name of the experiment
4 readonly datasetName: string; // Name of the dataset associated with the experiment
5
6 // Creates new experiment items by linking traces with dataset items
7 insert(experimentItemReferences: ExperimentItemReferences[]): Promise<void>;
8
9 // Retrieves experiment items with options for pagination and data truncation
10 getItems(options?: {
11 maxResults?: number; // Maximum number of items to retrieve
12 truncate?: boolean; // Whether to truncate large data fields
13 }): Promise<ExperimentItemContent[]>;
14
15 // Gets the URL to view the experiment in the Opik web interface
16 getUrl(): Promise<string>;
17}

ExperimentItemReferences

References connecting a dataset item to a trace:

1interface ExperimentItemReferences {
2 readonly datasetItemId: string; // ID of the dataset item
3 readonly traceId: string; // ID of the trace
4}

ExperimentItemContent

Content of an experiment item including evaluation data and feedback scores:

1interface ExperimentItemContent {
2 readonly id?: string; // Experiment item ID
3 readonly datasetItemId: string; // Dataset item ID
4 readonly traceId: string; // Trace ID
5 readonly datasetItemData?: JsonListStringCompare; // Dataset item data
6 readonly evaluationTaskOutput?: JsonListStringCompare; // Evaluation task output
7 readonly feedbackScores: FeedbackScore[]; // Feedback scores for the item
8}

FeedbackScore

Represents a feedback score for an experiment item:

1interface FeedbackScore {
2 categoryName: string; // Category of the feedback
3 name: string; // Name of the feedback metric
4 reason?: string; // Optional reason for the score
5 value: number; // Score value
6 source: string; // Source of the feedback
7}