In Opik 2.0, datasets and experiments are project-scoped. Make sure to specify a project_name when creating datasets and running experiments so they are associated with the correct project.
Evaluating your LLM application allows you to have confidence in the performance of your LLM application. In this guide, we will walk through manually creating experiments using data you have already computed.
This guide focuses on logging pre-computed evaluation results. If you’re looking to run evaluations with Opik computing the metrics, refer to the Evaluate your agent guide.
The process involves these key steps:
First, you’ll need to create a dataset containing your test cases. This dataset will be linked to your experiments.
Dataset item IDs will be automatically generated if not provided. If you do provide your own IDs, ensure they are in UUID7 format.
Structure your evaluation results with the necessary fields. Each experiment item should include:
dataset_item_id: The ID of the dataset item being evaluatedevaluate_task_result: The output from your LLM applicationfeedback_scores: Array of evaluation metrics (optional)Use the bulk endpoint to efficiently log multiple evaluation results at once.
Request Size Limit: The maximum allowed payload size is 4MB. For larger submissions, divide the data into smaller batches.
If you wish to divide the data into smaller batches, just add the experiment_id to the payload
so experiment items can be added to an existing experiment.
Below is an example of splitting the evaluation_items into two batches which will both be added
to the same experiment:
Once you have logged your experiment items, you can analyze the results in the Opik UI and even compare different experiments to one another.
Here’s a complete example that puts all the steps together:
You can include full execution traces with your experiment items for complete observability, to do
achieve this, add a trace and spans field to your experiment items:
evaluate_task_result or trace — not both.For Java developers, here’s how to integrate with Opik using Jackson and HttpClient:
If you are using the REST API with a local deployment, you can all the endpoints using:
PUT /api/v1/private/experiments/items/bulkexperiment_name, dataset_name, items (with dataset_item_id)