In just 15 minutes, learn how to evaluate your AI models with Opik’s TypeScript SDK. This guide will walk you through creating a dataset, defining an evaluation task, and analyzing results with built-in metrics – everything you need to start making data-driven decisions about your AI systems.
In Opik 2.0, datasets and experiments are project-scoped rather than workspace-scoped. Pass projectName to getOrCreateDataset() and evaluate() to associate them with the correct project.
💡 Copy, paste, and run this complete example that:
This section imports the necessary dependencies and configures your evaluation environment. The dotenv package securely loads your API keys from a .env file:
This section creates your evaluation dataset with full TypeScript support:
getOrCreateDataset to seamlessly work with existing or new datasets📌 Best practice: Add descriptive metadata to each item for powerful filtering and analysis in the Opik UI.
Your evaluation task:
This single function call:
ExactMatch metric that compares outputs exactlyWhen you run this code, you’ll receive an evaluation result object containing:
experimentId: Unique identifier for your evaluation experimentexperimentName: The name you providedtestResults: Array of results for each dataset item
testCase: Contains the input data and outputsscoreResults: Array of scores from each metricresultUrl: Link to view detailed results in the Opik platform.env file correctlyscoringKeyMapping to ensure it maps correctly to your dataset structure