Quick Start
In just 15 minutes, learn how to evaluate your AI models with Opik’s TypeScript SDK. This guide will walk you through creating a dataset, defining an evaluation task, and analyzing results with built-in metrics – everything you need to start making data-driven decisions about your AI systems.
Complete Working Example
💡 Copy, paste, and run this complete example that:
- Creates a structured dataset for AI evaluation
- Defines an evaluation task using OpenAI’s latest models
- Runs an evaluation with built-in metrics and analyzes the results
Step-by-Step Walkthrough
1. Setting up environment
This section imports the necessary dependencies and configures your evaluation environment. The dotenv
package securely loads your API keys from a .env
file:
2. Building a structured evaluation dataset
This section creates your evaluation dataset with full TypeScript support:
- Initialize the client: Connect to Opik’s evaluation platform
- Define your schema: Use TypeScript types for dataset items with full IDE autocompletion
- Retrieve or create: Use
getOrCreateDataset
to seamlessly work with existing or new datasets - Add evaluation items: Structure your test cases with inputs, expected outputs, and rich metadata for filtering and analysis
📌 Best practice: Add descriptive metadata to each item for powerful filtering and analysis in the Opik UI.
3. Defining your evaluation task
Your evaluation task:
- Receives dataset items: Automatically processes each item in your dataset
- Integrates with any API: Works with OpenAI, Anthropic, your own models, or any API
- Returns structured output: Package results in a format ready for evaluation
4. Running your evaluation
This single function call:
- The dataset we created
- Our defined LLM task
- The built-in
ExactMatch
metric that compares outputs exactly - A name for the experiment
- Key mapping to connect dataset fields with metric inputs
Expected Output
When you run this code, you’ll receive an evaluation result object containing:
experimentId
: Unique identifier for your evaluation experimentexperimentName
: The name you providedtestResults
: Array of results for each dataset itemtestCase
: Contains the input data and outputsscoreResults
: Array of scores from each metric
resultUrl
: Link to view detailed results in the Opik platform
Troubleshooting & Best Practices
API Key Issues
- Make sure you’ve set up your
.env
file correctly - Verify your API keys are valid and have the correct permissions
Metric Input Mapping
- Review your
scoringKeyMapping
to ensure it maps correctly to your dataset structure - Check that all metric required inputs are provided either in task output or via mapping