Log experiments with REST API
Evaluating your LLM application allows you to have confidence in the performance of your LLM application. In this guide, we will walk through logging pre-computed evaluation results to Opik using both the Python SDK and REST API.
This guide focuses on logging pre-computed evaluation results. If you’re looking to run evaluations with Opik computing the metrics, refer to the Evaluate Your LLM guide.
The process involves these key steps:
- Create a dataset with your test cases
- Prepare your evaluation results
- Log experiment items in bulk
1. Create a Dataset
First, you’ll need to create a dataset containing your test cases. This dataset will be linked to your experiments.
Python SDK
REST API
Dataset item IDs will be automatically generated if not provided. If you do provide your own IDs, ensure they are in UUID7 format.
2. Prepare Evaluation Results
Structure your evaluation results with the necessary fields. Each experiment item should include:
dataset_item_id
: The ID of the dataset item being evaluatedevaluate_task_result
: The output from your LLM applicationfeedback_scores
: Array of evaluation metrics (optional)
Python SDK
REST API
3. Log Experiment Items in Bulk
Use the bulk endpoint to efficiently log multiple evaluation results at once.
Python SDK
REST API
Request Size Limit: The maximum allowed payload size is 4MB. For larger submissions, divide the data into smaller batches.
Complete Example
Here’s a complete example that puts all the steps together:
Python SDK
REST API
Advanced Usage
Including Traces and Spans
You can include full execution traces with your experiment items for complete observability:
Python SDK
REST API
evaluate_task_result
or trace
— not both.Java Example
For Java developers, here’s how to integrate with Opik using Jackson and HttpClient:
Authentication
Configure authentication based on your deployment:
Open-Source (No Auth Required)
Opik Cloud
Environment Variables
For security and flexibility, use environment variables for credentials:
Then use them in your code:
Python
Bash
Reference
- Endpoint:
PUT /api/v1/private/experiments/items/bulk
- Max Payload Size: 4MB
- Required Fields:
experiment_name
,dataset_name
,items
(withdataset_item_id
) - SDK Reference: ExperimentsClient.experiment_items_bulk
- REST API Reference: Experiments API