Log experiments with REST API | Opik Documentation

Evaluating your LLM application allows you to have confidence in the performance of your LLM application. In this guide, we will walk through logging pre-computed evaluation results to Opik using both the Python SDK and REST API.

This guide focuses on logging pre-computed evaluation results. If you’re looking to run evaluations with Opik computing the metrics, refer to the Evaluate Your LLM guide.

The process involves these key steps:

Create a dataset with your test cases
Prepare your evaluation results
Log experiment items in bulk

1. Create a Dataset

First, you’ll need to create a dataset containing your test cases. This dataset will be linked to your experiments.

Python SDK

REST API

1 from opik import Opik
2 import opik
3 
4 # Configure Opik
5 opik.configure()
6 
7 # Create dataset items
8 dataset_items = [
9     {
10         "user_question": "What is the capital of France?",
11         "expected_output": "Paris"
12     },
13     {
14         "user_question": "What is the capital of Japan?",
15         "expected_output": "Tokyo"
16     },
17     {
18         "user_question": "What is the capital of Brazil?",
19         "expected_output": "Brasília"
20     }
21 ]
22 
23 # Get or create a dataset
24 client = Opik()
25 dataset = client.get_or_create_dataset(name="geography-questions")
26 
27 # Add dataset items
28 dataset.insert(dataset_items)

Dataset item IDs will be automatically generated if not provided. If you do provide your own IDs, ensure they are in UUID7 format.

2. Prepare Evaluation Results

Structure your evaluation results with the necessary fields. Each experiment item should include:

dataset_item_id: The ID of the dataset item being evaluated
evaluate_task_result: The output from your LLM application
feedback_scores: Array of evaluation metrics (optional)

Python SDK

REST API

1 # Get dataset items from the dataset object
2 dataset_items = list(dataset.get_items())
3 
4 # Mock LLM responses for this example
5 # In a real scenario, you would call your actual LLM here
6 mock_responses = {
7     "France": "The capital of France is Paris.",
8     "Japan": "Japan's capital is Tokyo.",
9     "Brazil": "The capital of Brazil is Rio de Janeiro."  # Incorrect
10 }
11 
12 # Prepare evaluation results
13 evaluation_items = []
14 
15 for item in dataset_items[:3]:  # Process first 3 items for this example
16     # Determine which mock response to use
17     question = item['user_question']
18     response = "I don't know"
19     
20     for country, mock_response in mock_responses.items():
21         if country.lower() in question.lower():
22             response = mock_response
23             break
24     
25     # Calculate accuracy (1.0 if expected answer is in response)
26     accuracy = 1.0 if item['expected_output'].lower() in response.lower() else 0.0
27     
28     evaluation_items.append({
29         "dataset_item_id": item['id'],
30         "evaluate_task_result": {
31             "prediction": response
32         },
33         "feedback_scores": [
34             {
35                 "name": "accuracy",
36                 "value": accuracy,
37                 "source": "sdk"
38             }
39         ]
40     })
41 
42 print(f"Prepared {len(evaluation_items)} evaluation items")

3. Log Experiment Items in Bulk

Use the bulk endpoint to efficiently log multiple evaluation results at once.

Python SDK

REST API

1 experiment_name = "Bulk experiment upload"
2 # Log experiment results using the bulk method
3 client.rest_client.experiments.experiment_items_bulk(
4     experiment_name=experiment_name,
5     dataset_name="geography-questions",
6     items=[
7         {
8             "dataset_item_id": item["dataset_item_id"],
9             "evaluate_task_result": item["evaluate_task_result"],
10             "feedback_scores": [
11                 {**score, "source": "sdk"} 
12                 for score in item["feedback_scores"]
13             ]
14         } 
15         for item in evaluation_items
16     ]
17 )

Request Size Limit: The maximum allowed payload size is 4MB. For larger submissions, divide the data into smaller batches.

Complete Example

Here’s a complete example that puts all the steps together:

Python SDK

REST API

1 from opik import Opik
2 import opik
3 import uuid
4 
5 # Configure Opik
6 opik.configure()
7 
8 # Step 1: Create dataset
9 client = Opik()
10 dataset = client.get_or_create_dataset(name="geography-questions")
11 
12 dataset_items = [
13     {
14         "user_question": "What is the capital of France?",
15         "expected_output": "Paris"
16     },
17     {
18         "user_question": "What is the capital of Japan?",
19         "expected_output": "Tokyo"
20     }
21 ]
22 
23 dataset.insert(dataset_items)
24 
25 # Step 2: Run your LLM application and collect results
26 # (In a real scenario, you would call your LLM here)
27 
28 # Helper function to get dataset item ID
29 def get_dataset_item(country):
30     items = dataset.get_items()
31     for item in items:
32         if country.lower() in item['user_question'].lower():
33             return item
34     return None
35 
36 # Prepare evaluation results
37 evaluation_items = [
38     {
39         "dataset_item_id": get_dataset_item("France")['id'],
40         "evaluate_task_result": {"prediction": "The capital of France is Paris."},
41         "feedback_scores": [{"name": "accuracy", "value": 1.0}]
42     },
43     {
44         "dataset_item_id": get_dataset_item("Japan")['id'],
45         "evaluate_task_result": {"prediction": "Japan's capital is Tokyo."},
46         "feedback_scores": [{"name": "accuracy", "value": 1.0}]
47     }
48 ]
49 
50 # Step 3: Log experiment results
51 rest_client = client.rest_client
52 experiment_name = f"geography-bot-{str(uuid.uuid4())[0:4]}"
53 rest_client.experiments.experiment_items_bulk(
54     experiment_name=experiment_name,
55     dataset_name="geography-questions",
56     items=[
57         {
58             "dataset_item_id": item["dataset_item_id"],
59             "evaluate_task_result": item["evaluate_task_result"],
60             "feedback_scores": [
61                 {**score, "source": "sdk"}
62                 for score in item["feedback_scores"]
63             ]
64         }
65         for item in evaluation_items
66     ]
67 )
68 
69 print(f"Experiment '{experiment_name}' created successfully!")

Advanced Usage

Including Traces and Spans

You can include full execution traces with your experiment items for complete observability:

Python SDK

REST API

1 # Include trace information
2 items_with_traces = [
3     {
4         "dataset_item_id": "your-dataset-item-id",
5         "trace": {
6             "name": "geography_query",
7             "input": {"question": "What is the capital of France?"},
8             "output": {"answer": "Paris"},
9             "metadata": {"model": "gpt-3.5-turbo"},
10             "start_time": "2024-01-01T00:00:00Z",
11             "end_time": "2024-01-01T00:00:01Z"
12         },
13         "spans": [
14             {
15                 "name": "llm_call",
16                 "type": "llm",
17                 "start_time": "2024-01-01T00:00:00Z",
18                 "end_time": "2024-01-01T00:00:01Z",
19                 "input": {"prompt": "What is the capital of France?"},
20                 "output": {"response": "Paris"}
21             }
22         ],
23         "feedback_scores": [
24             {"name": "accuracy", "value": 1.0, "source": "sdk"}
25         ]
26     }
27 ]

Important: You may supply either evaluate_task_result or trace — not both.

Java Example

For Java developers, here’s how to integrate with Opik using Jackson and HttpClient:

1 import com.fasterxml.jackson.databind.ObjectMapper;
2 import com.fasterxml.jackson.databind.JsonNode;
3 import com.fasterxml.jackson.databind.node.JsonNodeFactory;
4 import com.fasterxml.jackson.databind.node.ArrayNode;
5 
6 public class OpikExperimentLogger {
7 
8     public static void main(String[] args) {
9         ObjectMapper mapper = new ObjectMapper();
10 
11         String baseURI = System.getenv("OPIK_URL_OVERRIDE");
12         String workspaceName = System.getenv("OPIK_WORKSPACE");
13         String apiKey = System.getenv("OPIK_API_KEY");
14 
15         String datasetName = "geography-questions";
16         String experimentName = "geography-bot-v1";
17 
18         try (var client = HttpClient.newHttpClient()) {
19             // Stream dataset items
20             var streamRequest = HttpRequest.newBuilder()
21                     .uri(URI.create(baseURI).resolve("/v1/private/datasets/items/stream"))
22                     .header("Content-Type", "application/json")
23                     .header("Accept", "application/octet-stream")
24                     .header("Authorization", apiKey)
25                     .header("Comet-Workspace", workspaceName)
26                     .POST(HttpRequest.BodyPublishers.ofString(
27                         mapper.writeValueAsString(Map.of("dataset_name", datasetName))
28                     ))
29                     .build();
30 
31             HttpResponse<InputStream> streamResponse = client.send(
32                 streamRequest,
33                 HttpResponse.BodyHandlers.ofInputStream()
34             );
35 
36             List<JsonNode> experimentItems = new ArrayList<>();
37 
38             try (var reader = new BufferedReader(new InputStreamReader(streamResponse.body()))) {
39                 String line;
40                 while ((line = reader.readLine()) != null) {
41                     JsonNode datasetItem = mapper.readTree(line);
42                     String question = datasetItem.get("data").get("user_question").asText();
43                     UUID datasetItemId = UUID.fromString(datasetItem.get("id").asText());
44 
45                     // Call your LLM application
46                     JsonNode llmOutput = callYourLLM(question);
47 
48                     // Calculate metrics
49                     List<JsonNode> scores = calculateMetrics(llmOutput);
50 
51                     // Build experiment item
52                     ArrayNode scoresArray = JsonNodeFactory.instance.arrayNode().addAll(scores);
53                     JsonNode experimentItem = JsonNodeFactory.instance.objectNode()
54                             .put("dataset_item_id", datasetItemId.toString())
55                             .setAll(Map.of(
56                                 "evaluate_task_result", llmOutput,
57                                 "feedback_scores", scoresArray
58                             ));
59 
60                     experimentItems.add(experimentItem);
61                 }
62             }
63 
64             // Send experiment results in bulk
65             var bulkBody = JsonNodeFactory.instance.objectNode()
66                     .put("dataset_name", datasetName)
67                     .put("experiment_name", experimentName)
68                     .setAll(Map.of("items",
69                         JsonNodeFactory.instance.arrayNode().addAll(experimentItems)
70                     ));
71 
72             var bulkRequest = HttpRequest.newBuilder()
73                     .uri(URI.create(baseURI).resolve("/v1/private/experiments/items/bulk"))
74                     .header("Content-Type", "application/json")
75                     .header("Authorization", apiKey)
76                     .header("Comet-Workspace", workspaceName)
77                     .PUT(HttpRequest.BodyPublishers.ofString(bulkBody.toString()))
78                     .build();
79 
80             HttpResponse<String> bulkResponse = client.send(
81                 bulkRequest,
82                 HttpResponse.BodyHandlers.ofString()
83             );
84 
85             if (bulkResponse.statusCode() == 204) {
86                 System.out.println("Experiment items successfully created.");
87             } else {
88                 System.err.printf("Failed to create experiment items: %s %s",
89                     bulkResponse.statusCode(), bulkResponse.body());
90             }
91 
92         } catch (Exception e) {
93             throw new RuntimeException(e);
94         }
95     }
96 }

Authentication

Configure authentication based on your deployment:

Open-Source (No Auth Required)

Opik Cloud

$ # No authentication headers required for local deployments
> curl -X PUT 'http://localhost:5173/api/v1/private/experiments/items/bulk' \
>   -H 'Content-Type: application/json' \
>   -d '{ ... }'

Environment Variables

For security and flexibility, use environment variables for credentials:

$ export OPIK_API_KEY="your_api_key"
> export OPIK_WORKSPACE="your_workspace_name"
> export OPIK_URL_OVERRIDE="https://www.comet.com/opik/api"

Then use them in your code:

Python

Bash

1 import os
2 from opik import Opik
3 
4 # Opik SDK will automatically use these environment variables
5 client = Opik()
6 
7 # Or for direct REST API calls
8 headers = {
9     "Authorization": os.getenv('OPIK_API_KEY'),
10     "Comet-Workspace": os.getenv('OPIK_WORKSPACE')
11 }

Reference

Endpoint: PUT /api/v1/private/experiments/items/bulk
Max Payload Size: 4MB
Required Fields: experiment_name, dataset_name, items (with dataset_item_id)
SDK Reference: ExperimentsClient.experiment_items_bulk
REST API Reference: Experiments API