Models | Opik Documentation

The TypeScript SDK provides flexible model configuration through direct integration with the Vercel AI SDK. You can use models from multiple providers with a simple, unified interface.

Overview

The TypeScript SDK supports three ways to configure models for evaluation and prompt generation:

Model ID strings - Simple string identifiers (e.g., "gpt-4o", "claude-3-5-sonnet-latest")
LanguageModel instances - Pre-configured Vercel AI SDK models with custom settings
OpikBaseModel implementations - Custom model integrations for unsupported providers

Quick Start

Using Model ID Strings

The simplest approach is to pass a model ID string directly:

1 import { evaluatePrompt } from "opik";
2 import { Hallucination } from "opik";
3 
4 // OpenAI model
5 await evaluatePrompt({
6   dataset,
7   messages: [{ role: "user", content: "{{input}}" }],
8   model: "gpt-4o",
9 });
10 
11 // Anthropic model
12 await evaluatePrompt({
13   dataset,
14   messages: [{ role: "user", content: "{{input}}" }],
15   model: "claude-3-5-sonnet-latest",
16 });
17 
18 // Google Gemini model
19 await evaluatePrompt({
20   dataset,
21   messages: [{ role: "user", content: "{{input}}" }],
22   model: "gemini-2.0-flash",
23 });
24 
25 // Use in metrics
26 const metric = new Hallucination({ model: "gpt-4o" });

Using LanguageModel Instances

For advanced scenarios, use LanguageModel instances from Vercel AI SDK:

1 import { openai } from "@ai-sdk/openai";
2 import { evaluatePrompt } from "opik";
3 
4 // Create a LanguageModel instance
5 const customModel = openai("gpt-4o");
6 
7 await evaluatePrompt({
8   dataset,
9   messages: [{ role: "user", content: "{{input}}" }],
10   model: customModel,
11 });

Generation Parameters

Parameters for Metrics

All LLM Judge metrics support these generation parameters directly in the constructor:

1 import { Hallucination } from "opik";
2 
3 const metric = new Hallucination({
4   model: "gpt-4o",
5   temperature: 0.3, // Control randomness (0.0-2.0)
6   seed: 42, // For reproducible outputs
7   maxTokens: 1000, // Maximum response length
8 });
9 
10 // Use in evaluation
11 const score = await metric.score({
12   input: "What is the capital of France?",
13   output: "The capital of France is Paris.",
14   context: ["France is a country in Western Europe."],
15 });

For advanced generation parameters, use modelSettings:

1 import { Hallucination } from "opik";
2 
3 const metric = new Hallucination({
4   model: "gpt-4o",
5   temperature: 0.5,
6   modelSettings: {
7     topP: 0.9, // Nucleus sampling
8     topK: 50, // Top-K sampling
9     presencePenalty: 0.1, // Reduce repetition
10     frequencyPenalty: 0.2, // Reduce phrase repetition
11     stopSequences: ["END"], // Custom stop sequences
12   },
13 });

Parameters for evaluatePrompt

The evaluatePrompt function supports only temperature and seed:

1 import { evaluatePrompt } from "opik";
2 import { Hallucination, AnswerRelevance } from "opik";
3 
4 await evaluatePrompt({
5   dataset,
6   messages: [{ role: "user", content: "{{input}}" }],
7   model: "gpt-4o",
8   temperature: 0.7,
9   seed: 42,
10   scoringMetrics: [
11     new Hallucination({
12       model: "gpt-4o",
13       temperature: 0.3, // Full parameter support in metrics
14       seed: 12345,
15       maxTokens: 1000,
16     }),
17   ],
18 });

Note: For full control over all Vercel AI SDK parameters, create a LanguageModel instance with your desired configuration and pass it to the model parameter. See Using LanguageModel Instances below.

Supported Providers

OpenAI

OpenAI models are supported through the @ai-sdk/openai package.

Example model IDs:

1 "gpt-4o";
2 "gpt-4o-mini";
3 "gpt-4-turbo";

Usage:

1 import { evaluatePrompt } from "opik";
2 
3 await evaluatePrompt({
4   dataset,
5   messages: [{ role: "user", content: "{{input}}" }],
6   model: "gpt-4o",
7 });

For a complete list of available models, see the Vercel AI SDK OpenAI provider documentation.

Anthropic

Anthropic’s Claude models are supported through the @ai-sdk/anthropic package.

Example model IDs:

1 "claude-3-5-sonnet-latest";
2 "claude-3-5-haiku-latest";

Usage:

1 await evaluatePrompt({
2   dataset,
3   messages: [{ role: "user", content: "{{input}}" }],
4   model: "claude-3-5-sonnet-latest",
5 });

For a complete list of available models, see the Vercel AI SDK Anthropic provider documentation.

Google Gemini

Google’s Gemini models are supported through the @ai-sdk/google package.

Example model IDs:

1 "gemini-2.0-flash";
2 "gemini-1.5-pro";

Usage:

1 await evaluatePrompt({
2   dataset,
3   messages: [{ role: "user", content: "{{input}}" }],
4   model: "gemini-2.0-flash",
5 });

For a complete list of available models, see the Vercel AI SDK Google provider documentation.

Using Models in Opik

Using LanguageModel Instances

For advanced scenarios requiring full Vercel AI SDK features (such as structured outputs, custom headers, or provider-specific parameters), create LanguageModel instances directly:

1 import { openai } from "@ai-sdk/openai";
2 import { anthropic } from "@ai-sdk/anthropic";
3 import { evaluatePrompt } from "opik";
4 import { Hallucination } from "opik";
5 
6 // Create models with advanced configuration
7 const genModel = openai("gpt-4o-mini", {
8   structuredOutputs: true, // Provider-specific feature
9 });
10 
11 const evalModel = anthropic("claude-3-5-sonnet-latest");
12 
13 // Use different models for generation and evaluation
14 await evaluatePrompt({
15   dataset,
16   messages: [{ role: "user", content: "{{input}}" }],
17   model: genModel,
18   scoringMetrics: [new Hallucination({ model: evalModel })],
19 });

This approach gives you full control over Vercel AI SDK parameters that aren’t exposed through Opik’s simple interface.

Using Models with Metrics

LLM Judge metrics accept model configuration:

With Model ID String

1 import { Hallucination, AnswerRelevance } from "opik";
2 
3 // Use different models for different metrics
4 const hallucinationMetric = new Hallucination({ model: "gpt-4o" });
5 const relevanceMetric = new AnswerRelevance({
6   model: "claude-3-5-sonnet-latest",
7 });
8 
9 await evaluatePrompt({
10   dataset,
11   messages: [{ role: "user", content: "{{input}}" }],
12   model: "gpt-4o",
13   scoringMetrics: [hallucinationMetric, relevanceMetric],
14 });

With LanguageModel Instance

1 import { openai } from "@ai-sdk/openai";
2 import { Hallucination } from "opik";
3 
4 // Create model for metric evaluation
5 const judgeModel = openai("gpt-4o");
6 
7 const metric = new Hallucination({ model: judgeModel });
8 
9 await evaluatePrompt({
10   dataset,
11   messages: [{ role: "user", content: "{{input}}" }],
12   model: "gpt-4o",
13   scoringMetrics: [metric],
14 });

Custom Model Implementation

For unsupported providers, implement the OpikBaseModel interface:

OpikBaseModel Interface

1 abstract class OpikBaseModel {
2   constructor(public readonly modelName: string) {}
3 
4   /**
5    * Generate a string response from a text prompt
6    */
7   abstract generateString(input: string): Promise<string>;
8 
9   /**
10    * Generate a response from messages with provider-specific format
11    */
12   abstract generateProviderResponse(messages: OpikMessage[]): Promise<unknown>;
13 }

Example Implementation

1 import { OpikBaseModel, OpikMessage } from "opik";
2 
3 class CustomProviderModel extends OpikBaseModel {
4   private apiKey: string;
5   private baseUrl: string;
6 
7   constructor(modelName: string, apiKey: string, baseUrl: string) {
8     super(modelName);
9     this.apiKey = apiKey;
10     this.baseUrl = baseUrl;
11   }
12 
13   async generateString(input: string): Promise<string> {
14     const messages: OpikMessage[] = [
15       {
16         role: "user",
17         content: input,
18       },
19     ];
20 
21     const response = await this.generateProviderResponse(messages);
22     // Extract text from provider response format
23     return response.choices[0].message.content;
24   }
25 
26   async generateProviderResponse(messages: OpikMessage[]): Promise<unknown> {
27     const response = await fetch(`${this.baseUrl}/chat/completions`, {
28       method: "POST",
29       headers: {
30         Authorization: `Bearer ${this.apiKey}`,
31         "Content-Type": "application/json",
32       },
33       body: JSON.stringify({
34         model: this.modelName,
35         messages: messages,
36       }),
37     });
38 
39     if (!response.ok) {
40       throw new Error(`API request failed: ${response.statusText}`);
41     }
42 
43     return response.json();
44   }
45 }
46 
47 // Usage
48 const customModel = new CustomProviderModel(
49   "custom-model-v1",
50   process.env.CUSTOM_API_KEY,
51   "https://api.custom-provider.com"
52 );
53 
54 await evaluatePrompt({
55   dataset,
56   messages: [{ role: "user", content: "{{input}}" }],
57   model: customModel,
58 });

Model Resolution

The SDK automatically resolves models in this order:

If a string is provided: Auto-detects provider and creates appropriate model
If LanguageModel is provided: Uses the instance directly
If OpikBaseModel is provided: Uses the custom implementation
If undefined: Defaults to "gpt-4o"

1 // String → Auto-detected as OpenAI
2 model: "gpt-4o";
3 
4 // LanguageModel → Used directly
5 import { openai } from "@ai-sdk/openai";
6 model: openai("gpt-4o");
7 
8 // Custom implementation
9 model: new CustomProviderModel("my-model", apiKey, baseUrl);
10 
11 // Undefined → Defaults to "gpt-4o"
12 model: undefined;

Best Practices

1. Use Model ID Strings for Simplicity

For most use cases, use model ID strings directly:

1 import { Hallucination } from "opik";
2 
3 const metric = new Hallucination({ model: "gpt-4o" });

The Opik SDK handles model configuration internally for optimal evaluation performance.

2. Match Model Capabilities to Task

Choose models based on task requirements:

1 // Complex reasoning: GPT-4o, Claude Sonnet
2 model: "gpt-4o";
3 model: "claude-3-5-sonnet-latest";
4 
5 // Fast responses: GPT-4o-mini, Gemini Flash
6 model: "gpt-4o-mini";
7 model: "gemini-2.0-flash";
8 
9 // Long context: Claude, Gemini
10 model: "claude-3-5-sonnet-latest"; // 200K context
11 model: "gemini-1.5-pro"; // 1M context

3. Use Different Models for Tasks and Metrics

Optimize costs by using different models:

1 await evaluatePrompt({
2   dataset,
3   messages: [{ role: "user", content: "{{input}}" }],
4   model: "gpt-4o-mini", // Cheaper for generation
5   scoringMetrics: [
6     new Hallucination({ model: "gpt-4o" }), // More accurate for evaluation
7   ],
8 });

4. Configure API Keys

Set up environment variables for each provider:

$ # OpenAI
> export OPENAI_API_KEY="sk-..."
> 
> # Anthropic
> export ANTHROPIC_API_KEY="sk-ant-..."
> 
> # Google
> export GOOGLE_API_KEY="..."

5. Handle Rate Limits

Use appropriate worker counts to avoid rate limits:

1 await evaluatePrompt({
2   dataset,
3   messages: [{ role: "user", content: "{{input}}" }],
4   model: "gpt-4o",
5   taskWorkers: 5, // Limit parallel requests
6 });

Troubleshooting

API Key Not Found

1 // Error: API key not found for provider
2 
3 // Solution: Set environment variable
4 process.env.OPENAI_API_KEY = "sk-...";

Model Not Supported

1 // Error: Unsupported model ID
2 
3 // Solution: Use custom implementation
4 class MyModel extends OpikBaseModel {
5   // ... implementation
6 }

Rate Limit Errors

1 // Error: Rate limit exceeded
2 
3 // Solution: Reduce worker count
4 await evaluatePrompt({
5   dataset,
6   messages: [{ role: "user", content: "{{input}}" }],
7   model: "gpt-4o",
8   taskWorkers: 3, // Reduce from default 10
9 });

Overview

Quick Start

Using Model ID Strings

Using LanguageModel Instances

Generation Parameters

Parameters for Metrics

Parameters for evaluatePrompt

Supported Providers

OpenAI

Anthropic

Google Gemini

Using Models in Opik

Using LanguageModel Instances

Using Models with Metrics

With Model ID String

With LanguageModel Instance

Custom Model Implementation

OpikBaseModel Interface

Example Implementation

Model Resolution

Best Practices

1. Use Model ID Strings for Simplicity

2. Match Model Capabilities to Task

3. Use Different Models for Tasks and Metrics

4. Configure API Keys

5. Handle Rate Limits

Troubleshooting

API Key Not Found

Model Not Supported

Rate Limit Errors

See Also