Re-running Experiments in Opik: Update Names & Scores

In Opik 2.0, datasets and experiments are project-scoped. Make sure to specify a project_name when creating datasets and running experiments so they are associated with the correct project.

You can update existing experiments in several ways:

Update experiment name and configuration - Change the experiment name or update its configuration metadata
Update experiment scores - Add new scores or re-compute existing scores for experiment items

Update Experiment Name and Configuration

You can update an experiment’s name and configuration from both the Opik UI and the SDKs.

From the Opik UI

To update an experiment from the UI:

Navigate to the Experiments page
Find the experiment you want to update
Click the … menu button on the experiment row
Select Edit from the dropdown menu

Update the experiment name and/or configuration (JSON format)
Click Update Experiment to save your changes

The configuration is stored as JSON and is useful for tracking parameters like model names, temperatures, prompt templates, or any other metadata relevant to your experiment.

From the Python SDK

Use the update_experiment method to update an experiment’s name and configuration:

1 import opik
2 
3 client = opik.Opik()
4 
5 # Update experiment name
6 client.update_experiment(
7     id="experiment-id",
8     name="Updated Experiment Name"
9 )
10 
11 # Update experiment configuration
12 client.update_experiment(
13     id="experiment-id",
14     experiment_config={
15         "model": "gpt-4",
16         "temperature": 0.7,
17         "prompt_template": "Answer the following question: {question}"
18     }
19 )
20 
21 # Update both name and configuration
22 client.update_experiment(
23     id="experiment-id",
24     name="Updated Experiment Name",
25     experiment_config={
26         "model": "gpt-4",
27         "temperature": 0.7
28     }
29 )

From the TypeScript SDK

Use the updateExperiment method to update an experiment’s name and configuration:

1 import { Opik } from "opik";
2 
3 const opik = new Opik();
4 
5 // Update experiment name
6 await opik.updateExperiment("experiment-id", {
7   name: "Updated Experiment Name"
8 });
9 
10 // Update experiment configuration
11 await opik.updateExperiment("experiment-id", {
12   experimentConfig: {
13     model: "gpt-4",
14     temperature: 0.7,
15     promptTemplate: "Answer the following question: {question}"
16   }
17 });
18 
19 // Update both name and configuration
20 await opik.updateExperiment("experiment-id", {
21   name: "Updated Experiment Name",
22   experimentConfig: {
23     model: "gpt-4",
24     temperature: 0.7
25   }
26 });

Update Experiment Scores

Sometimes you may want to update an existing experiment with new scores, or update existing scores for an experiment. You can do this using the evaluate_experiment function.

This function will re-run the scoring metrics on the existing experiment items and update the scores:

1 from opik.evaluation import evaluate_experiment
2 from opik.evaluation.metrics import Hallucination
3 
4 hallucination_metric = Hallucination()
5 
6 # Replace "my-experiment" with the name of your experiment which can be found in the Opik UI
7 evaluate_experiment(experiment_name="my-experiment", scoring_metrics=[hallucination_metric])

The evaluate_experiment function can be used to update existing scores for an experiment. If you use a scoring metric with the same name as an existing score, the scores will be updated with the new values.

You can also compute experiment-level aggregate metrics when updating experiments using the experiment_scoring_functions parameter. Learn more about experiment-level metrics.

Example

Create an experiment

Suppose you are building a chatbot and want to compute the hallucination scores for a set of example conversations. For this you would create a first experiment with the evaluate function:

1 import opik
2 from opik import Opik, track
3 from opik.evaluation import evaluate
4 from opik.evaluation.metrics import Hallucination
5 from opik.integrations.openai import track_openai
6 import openai
7 
8 opik.configure(project_name="my-project")
9 
10 # Define the task to evaluate
11 openai_client = track_openai(openai.OpenAI())
12 
13 MODEL = "gpt-3.5-turbo"
14 
15 @track
16 def your_llm_application(input: str) -> str:
17     response = openai_client.chat.completions.create(
18         model=MODEL,
19         messages=[{"role": "user", "content": input}],
20     )
21 
22     return response.choices[0].message.content
23 
24 # Define the evaluation task
25 def evaluation_task(x):
26     return {
27         "output": your_llm_application(x['input'])
28     }
29 
30 # Create a simple dataset
31 client = Opik()
32 dataset = client.get_or_create_dataset(name="Existing experiment dataset", project_name="my-project")
33 dataset.insert([
34     {"input": "What is the capital of France?"},
35     {"input": "What is the capital of Germany?"},
36 ])
37 
38 # Define the metrics
39 hallucination_metric = Hallucination()
40 
41 
42 evaluation = evaluate(
43     experiment_name="Existing experiment example",
44     dataset=dataset,
45     task=evaluation_task,
46     scoring_metrics=[hallucination_metric],
47     project_name="my-project",
48     experiment_config={
49         "model": MODEL
50     }
51 )
52 
53 experiment_name = evaluation.experiment_name
54 print(f"Experiment name: {experiment_name}")

Learn more about the evaluate function in our LLM evaluation guide.

Update the experiment

Once the first experiment is created, you realise that you also want to compute a moderation score for each example. You could re-run the experiment with new scoring metrics but this means re-running the output. Instead, you can simply update the experiment with the new scoring metrics:

1 from opik.evaluation import evaluate_experiment
2 from opik.evaluation.metrics import Moderation
3 
4 moderation_metric = Moderation()
5 
6 evaluate_experiment(experiment_name="already_existing_experiment", scoring_metrics=[moderation_metric])