Log experiments with REST API

Step by step guide to using the Experiments REST API to log evaluation results

If you’re working in Python or JavaScript, the easiest way to integrate with Opik is through our official SDKs.

But if your stack includes something else, like Go, Java, Kotlin, or Ruby, no problem! That’s where the REST API comes in: it gives you flexible, language-agnostic access to log and manage your projects and experiments directly in Opik.

This guide shows you how to record experiment results for your LLM application using the Experiments bulk logging API.

The full API reference for the Record experiment items in bulk endpoint is available here.

Endpoint Overview

The Record Experiment Items in Bulk endpoint allows you to log multiple experiment item evaluations in a single request, including optional model outputs, traces, spans, and structured feedback scores.

Method: PUT
URL: /api/v1/private/experiments/items/bulk

Request Size Limit: The maximum allowed payload size is 4MB. For larger submissions, please divide the data into smaller batches.

Minimum Required Fields

At a minimum, your request must contain:

  • experiment_name (string): Name of the experiment.
  • dataset_name (string): Name of the dataset the evaluation is tied to.
  • items (list of objects): Each object must include a unique dataset_item_id (UUID).

This minimal structure is sufficient to register the dataset item to the experiment.

Optional Enhancements for Richer Evaluation

Each item can optionally include:

  • evaluate_task_result: A map, list, or string representing the output of your application.
  • trace: An object representing the full execution trace.
Important: you may supply either evaluate_task_result or trace — not both.
  • spans: A list of structured span objects representing sub-steps or stages of execution.
  • feedback_scores: A list of structured objects that describe evaluation signals. Each feedback score includes:
    • name
    • category_name
    • value
    • reason
    • source

Tip: use feedback scores to record evaluations such as accuracy, fluency, or custom criteria from heuristics or human reviewers.

Example Use Cases

Here are a few common ways teams can use the bulk logging endpoint to evaluate their LLM applications effectively:

  • Register a dataset item with minimal fields.
  • Log application responses with evaluate_task_result.
  • Attach feedback scores like accuracy scores or annotations.
  • Enable full experiment observability with traces and spans.

Example Requests

1. Minimal Payload

$curl -X PUT http://localhost:5173/api/v1/private/experiments/items/bulk -H "Content-Type: application/json" -d '{
> "experiment_name": "my_experiment",
> "dataset_name": "my_dataset",
> "items": [
> {
> "dataset_item_id": "4a7c2cfb-1234-4321-aaaa-111111111111"
> }
> ]
> }'

2. Add Model Output

$curl -X PUT http://localhost:5173/api/v1/private/experiments/items/bulk -H "Content-Type: application/json" -d '{
> "experiment_name": "my_experiment",
> "dataset_name": "my_dataset",
> "items": [
> {
> "dataset_item_id": "4a7c2cfb-1234-4321-aaaa-111111111111",
> "evaluate_task_result": {
> "_llm_task_output": "Madrid",
> "explanation": "Predicted capital of Spain"
> }
> }
> ]
> }'

3. Include Feedback Scores

$curl -X PUT http://localhost:5173/api/v1/private/experiments/items/bulk -H "Content-Type: application/json" -d '{
> "experiment_name": "my_experiment",
> "dataset_name": "my_dataset",
> "items": [
> {
> "dataset_item_id": "4a7c2cfb-1234-4321-aaaa-111111111111",
> "evaluate_task_result": {
> "_llm_task_output": "Madrid"
> },
> "feedback_scores": [
> {
> "name": "accuracy",
> "category_name": "geography",
> "value": 1,
> "reason": "Correct capital",
> "source": "ui"
> }
> ]
> }
> ]
> }'

4. Full Payload with Multiple Items - Example in Python

1import requests
2import os
3
4url = os.getenv('OPIK_URL_OVERRIDE') + "/v1/private/experiments/items/bulk"
5headers = {
6 "Authorization": os.getenv('OPIK_API_KEY'),
7 "Comet-Workspace": os.getenv('OPIK_WORKSPACE'),
8 "Content-Type": "application/json"
9}
10
11data = {
12 "experiment_name": "new_experiment_api_full",
13 "dataset_name": "valid-structure",
14 "items": [
15 {
16 "dataset_item_id": "0196ab58-b7aa-7241-8672-965726cb236c",
17 "evaluate_task_result": {
18 "_llm_task_output": "Madrid",
19 "just_field": "Hello from here :)"
20 },
21 "feedback_scores": [
22 {
23 "name": "accuracy",
24 "category_name": "geography",
25 "value": 1,
26 "reason": "Correct city",
27 "source": "ui"
28 }
29 ]
30 },
31 {
32 "dataset_item_id": "0196ab58-643e-714a-b2bc-b5d93c2889aa",
33 "evaluate_task_result": {
34 "_llm_task_output": "Kyiv"
35 },
36 "feedback_scores": [
37 {
38 "name": "fluency",
39 "category_name": "language",
40 "value": 1,
41 "reason": "Output was fluent",
42 "source": "ui"
43 }
44 ]
45 },
46 {
47 "dataset_item_id": "0196ab57-d1ce-72f5-abef-793153fff106",
48 "evaluate_task_result": {
49 "_llm_task_output": "Paris"
50 },
51 "feedback_scores": [
52 {
53 "name": "accuracy",
54 "category_name": "geography",
55 "value": 1,
56 "reason": "Correct",
57 "source": "ui"
58 }
59 ]
60 }
61 ]
62}
63
64try:
65 response = requests.put(url, headers=headers, json=data)
66 print(response.status_code)
67 print(response.json())
68except requests.exceptions.RequestException as e:
69 print("Request failed:", e)

5. Full Payload with Multiple Items - Example in Java (Using Jackson + HttpClient)

The following example shows how to stream dataset items and log experiment results in bulk using Java.

1public static void main(String[] args) {
2
3 // This example uses Jackson to handle JSON serialization and deserialization.
4 ObjectMapper mapper = new ObjectMapper();
5
6 String baseURI = System.getenv("OPIK_URL_OVERRIDE"); // e.g., "http://localhost:5173/api"
7 String workspaceName = System.getenv("WORKSPACE_NAME");
8 String apiKey = System.getenv("API_KEY");
9
10 String datasetName = "my-dataset";
11 String experimentName = "my-experiment";
12
13 Map<String, String> requestBody = Map.of("dataset_name", datasetName);
14
15 try (var client = HttpClient.newHttpClient()) {
16
17 var request = HttpRequest.newBuilder()
18 .uri(URI.create(baseURI).resolve("/v1/private/datasets/items/stream"))
19 .header("Content-Type", "application/json")
20 .header("Accept", "application/octet-stream")
21 .header("Authorization", apiKey)
22 .header("Comet-Workspace", workspaceName)
23 .POST(HttpRequest.BodyPublishers.ofString(mapper.writeValueAsString(requestBody)))
24 .build();
25
26 HttpResponse<InputStream> response = client.send(
27 request,
28 HttpResponse.BodyHandlers.ofInputStream()
29 );
30
31 List<JsonNode> experimentItems = new ArrayList<>();
32 try (var reader = new BufferedReader(new InputStreamReader(response.body()))) {
33 String line;
34 while ((line = reader.readLine()) != null) {
35 JsonNode datasetItem = mapper.readTree(line);
36 String question = datasetItem.get("data").get("question").asText();
37 UUID datasetItemId = UUID.fromString(datasetItem.get("id").asText());
38
39 JsonNode evaluationTaskOutput = callTargetApplication(question);
40 List<JsonNode> scoresResults = callLmmOrMetricCalc(evaluationTaskOutput);
41
42 ArrayNode scoreResultsArray = JsonNodeFactory.instance.arrayNode().addAll(scoresResults);
43
44 JsonNode experimentItem = JsonNodeFactory.instance.objectNode()
45 .put("dataset_item_id", datasetItemId.toString())
46 .setAll(Map.of(
47 "evaluation_task_output", evaluationTaskOutput,
48 "feedback_scores", scoreResultsArray));
49
50 experimentItems.add(experimentItem);
51 }
52 }
53
54 var experimentBatchBody = JsonNodeFactory.instance.objectNode()
55 .put("dataset_name", datasetName)
56 .put("experiment_name", experimentName)
57 .setAll(Map.of("items", JsonNodeFactory.instance.arrayNode().addAll(experimentItems)));
58
59 var experimentRequest = HttpRequest.newBuilder()
60 .uri(URI.create(baseURI).resolve("/v1/private/experiments/items/bulk"))
61 .header("Content-Type", "application/json")
62 .header("Authorization", apiKey)
63 .header("Comet-Workspace", workspaceName)
64 .PUT(HttpRequest.BodyPublishers.ofString(experimentBatchBody.toString()))
65 .build();
66
67 HttpResponse<String> experimentResponse = client.send(experimentRequest, HttpResponse.BodyHandlers.ofString());
68
69 if (experimentResponse.statusCode() == 204) {
70 System.out.println("Experiment items successfully created.");
71 } else {
72 System.out.printf("Failed to create experiment items, status code %s body %s",
73 experimentResponse.statusCode(), experimentResponse.body());
74 }
75
76 } catch (InterruptedException | IOException e) {
77 throw new RuntimeException(e);
78 }
79}

Authentication

Depending on your deployment, you can access the Experiments REST API either without authentication for local open-source on-premise setups, or with API key authentication for the Opik Cloud environment

$curl -X PUT 'http://localhost:5173/api/v1/private/experiments/items/bulk' \
> -H 'Content-Type: application/json' \
> -d '{ ... }'

Environment Variables

To promote security, flexibility, and reusability, it is recommended to manage authentication credentials using environment variables. This approach prevents hardcoding sensitive information and allows seamless configuration across different environments.

You can define environment variables directly in your system environment or load them via a .env file using tools like dotenv. Alternatively, credentials and other configurations can also be managed through a centralized configuration file, depending on your deployment setup and preference.

$export OPIK_API_KEY="your_api_key"
>export OPIK_WORKSPACE="your_workspace_name"
>export OPIK_URL_OVERRIDE="https://www.comet.com/opik/api"