Evaluate Function
In Opik 2.0, experiments are project-scoped. Make sure to specify a projectName when calling evaluate() so results are associated with the correct project.
The evaluate function allows you to run comprehensive evaluations of LLM tasks against datasets using customizable metrics.
Parameters
The function accepts a single options parameter of type EvaluateOptions, which contains the following properties:
Returns
The function returns a Promise that resolves to an EvaluationResult object containing:
- Aggregated scores across all evaluated samples
- Individual sample results
- Execution metadata
Example Usage
Evaluating a Specific Version
For reproducible evaluations, use a DatasetVersion instead of Dataset:
Notes
- The function automatically creates an experiment in Opik for tracking and analysis
- If no
clientis provided, it uses the global Opik client instance - You can provide type parameters to properly type your dataset and task inputs/outputs
- Errors during evaluation will be properly logged and re-thrown