Models

The TypeScript SDK provides flexible model configuration through direct integration with the Vercel AI SDK. You can use models from multiple providers with a simple, unified interface.

Overview

The TypeScript SDK supports three ways to configure models for evaluation and prompt generation:

  1. Model ID strings - Simple string identifiers (e.g., "gpt-4o", "claude-3-5-sonnet-latest")
  2. LanguageModel instances - Pre-configured Vercel AI SDK models with custom settings
  3. OpikBaseModel implementations - Custom model integrations for unsupported providers

Quick Start

Using Model ID Strings

The simplest approach is to pass a model ID string directly:

1import { evaluatePrompt } from "opik";
2import { Hallucination } from "opik";
3
4// OpenAI model
5await evaluatePrompt({
6 dataset,
7 messages: [{ role: "user", content: "{{input}}" }],
8 model: "gpt-4o",
9});
10
11// Anthropic model
12await evaluatePrompt({
13 dataset,
14 messages: [{ role: "user", content: "{{input}}" }],
15 model: "claude-3-5-sonnet-latest",
16});
17
18// Google Gemini model
19await evaluatePrompt({
20 dataset,
21 messages: [{ role: "user", content: "{{input}}" }],
22 model: "gemini-2.0-flash",
23});
24
25// Use in metrics
26const metric = new Hallucination({ model: "gpt-4o" });

Using LanguageModel Instances

For advanced scenarios, use LanguageModel instances from Vercel AI SDK:

1import { openai } from "@ai-sdk/openai";
2import { evaluatePrompt } from "opik";
3
4// Create a LanguageModel instance
5const customModel = openai("gpt-4o");
6
7await evaluatePrompt({
8 dataset,
9 messages: [{ role: "user", content: "{{input}}" }],
10 model: customModel,
11});

Generation Parameters

Parameters for Metrics

All LLM Judge metrics support these generation parameters directly in the constructor:

1import { Hallucination } from "opik";
2
3const metric = new Hallucination({
4 model: "gpt-4o",
5 temperature: 0.3, // Control randomness (0.0-2.0)
6 seed: 42, // For reproducible outputs
7 maxTokens: 1000, // Maximum response length
8});
9
10// Use in evaluation
11const score = await metric.score({
12 input: "What is the capital of France?",
13 output: "The capital of France is Paris.",
14 context: ["France is a country in Western Europe."],
15});

For advanced generation parameters, use modelSettings:

1import { Hallucination } from "opik";
2
3const metric = new Hallucination({
4 model: "gpt-4o",
5 temperature: 0.5,
6 modelSettings: {
7 topP: 0.9, // Nucleus sampling
8 topK: 50, // Top-K sampling
9 presencePenalty: 0.1, // Reduce repetition
10 frequencyPenalty: 0.2, // Reduce phrase repetition
11 stopSequences: ["END"], // Custom stop sequences
12 },
13});

Parameters for evaluatePrompt

The evaluatePrompt function supports only temperature and seed:

1import { evaluatePrompt } from "opik";
2import { Hallucination, AnswerRelevance } from "opik";
3
4await evaluatePrompt({
5 dataset,
6 messages: [{ role: "user", content: "{{input}}" }],
7 model: "gpt-4o",
8 temperature: 0.7,
9 seed: 42,
10 scoringMetrics: [
11 new Hallucination({
12 model: "gpt-4o",
13 temperature: 0.3, // Full parameter support in metrics
14 seed: 12345,
15 maxTokens: 1000,
16 }),
17 ],
18});

Note: For full control over all Vercel AI SDK parameters, create a LanguageModel instance with your desired configuration and pass it to the model parameter. See Using LanguageModel Instances below.

Supported Providers

OpenAI

OpenAI models are supported through the @ai-sdk/openai package.

Example model IDs:

1"gpt-4o";
2"gpt-4o-mini";
3"gpt-4-turbo";

Usage:

1import { evaluatePrompt } from "opik";
2
3await evaluatePrompt({
4 dataset,
5 messages: [{ role: "user", content: "{{input}}" }],
6 model: "gpt-4o",
7});

For a complete list of available models, see the Vercel AI SDK OpenAI provider documentation.

Anthropic

Anthropic’s Claude models are supported through the @ai-sdk/anthropic package.

Example model IDs:

1"claude-3-5-sonnet-latest";
2"claude-3-5-haiku-latest";

Usage:

1await evaluatePrompt({
2 dataset,
3 messages: [{ role: "user", content: "{{input}}" }],
4 model: "claude-3-5-sonnet-latest",
5});

For a complete list of available models, see the Vercel AI SDK Anthropic provider documentation.

Google Gemini

Google’s Gemini models are supported through the @ai-sdk/google package.

Example model IDs:

1"gemini-2.0-flash";
2"gemini-1.5-pro";

Usage:

1await evaluatePrompt({
2 dataset,
3 messages: [{ role: "user", content: "{{input}}" }],
4 model: "gemini-2.0-flash",
5});

For a complete list of available models, see the Vercel AI SDK Google provider documentation.

Using Models in Opik

Using LanguageModel Instances

For advanced scenarios requiring full Vercel AI SDK features (such as structured outputs, custom headers, or provider-specific parameters), create LanguageModel instances directly:

1import { openai } from "@ai-sdk/openai";
2import { anthropic } from "@ai-sdk/anthropic";
3import { evaluatePrompt } from "opik";
4import { Hallucination } from "opik";
5
6// Create models with advanced configuration
7const genModel = openai("gpt-4o-mini", {
8 structuredOutputs: true, // Provider-specific feature
9});
10
11const evalModel = anthropic("claude-3-5-sonnet-latest");
12
13// Use different models for generation and evaluation
14await evaluatePrompt({
15 dataset,
16 messages: [{ role: "user", content: "{{input}}" }],
17 model: genModel,
18 scoringMetrics: [new Hallucination({ model: evalModel })],
19});

This approach gives you full control over Vercel AI SDK parameters that aren’t exposed through Opik’s simple interface.

Using Models with Metrics

LLM Judge metrics accept model configuration:

With Model ID String

1import { Hallucination, AnswerRelevance } from "opik";
2
3// Use different models for different metrics
4const hallucinationMetric = new Hallucination({ model: "gpt-4o" });
5const relevanceMetric = new AnswerRelevance({
6 model: "claude-3-5-sonnet-latest",
7});
8
9await evaluatePrompt({
10 dataset,
11 messages: [{ role: "user", content: "{{input}}" }],
12 model: "gpt-4o",
13 scoringMetrics: [hallucinationMetric, relevanceMetric],
14});

With LanguageModel Instance

1import { openai } from "@ai-sdk/openai";
2import { Hallucination } from "opik";
3
4// Create model for metric evaluation
5const judgeModel = openai("gpt-4o");
6
7const metric = new Hallucination({ model: judgeModel });
8
9await evaluatePrompt({
10 dataset,
11 messages: [{ role: "user", content: "{{input}}" }],
12 model: "gpt-4o",
13 scoringMetrics: [metric],
14});

Custom Model Implementation

For unsupported providers, implement the OpikBaseModel interface:

OpikBaseModel Interface

1abstract class OpikBaseModel {
2 constructor(public readonly modelName: string) {}
3
4 /**
5 * Generate a string response from a text prompt
6 */
7 abstract generateString(input: string): Promise<string>;
8
9 /**
10 * Generate a response from messages with provider-specific format
11 */
12 abstract generateProviderResponse(messages: OpikMessage[]): Promise<unknown>;
13}

Example Implementation

1import { OpikBaseModel, OpikMessage } from "opik";
2
3class CustomProviderModel extends OpikBaseModel {
4 private apiKey: string;
5 private baseUrl: string;
6
7 constructor(modelName: string, apiKey: string, baseUrl: string) {
8 super(modelName);
9 this.apiKey = apiKey;
10 this.baseUrl = baseUrl;
11 }
12
13 async generateString(input: string): Promise<string> {
14 const messages: OpikMessage[] = [
15 {
16 role: "user",
17 content: input,
18 },
19 ];
20
21 const response = await this.generateProviderResponse(messages);
22 // Extract text from provider response format
23 return response.choices[0].message.content;
24 }
25
26 async generateProviderResponse(messages: OpikMessage[]): Promise<unknown> {
27 const response = await fetch(`${this.baseUrl}/chat/completions`, {
28 method: "POST",
29 headers: {
30 Authorization: `Bearer ${this.apiKey}`,
31 "Content-Type": "application/json",
32 },
33 body: JSON.stringify({
34 model: this.modelName,
35 messages: messages,
36 }),
37 });
38
39 if (!response.ok) {
40 throw new Error(`API request failed: ${response.statusText}`);
41 }
42
43 return response.json();
44 }
45}
46
47// Usage
48const customModel = new CustomProviderModel(
49 "custom-model-v1",
50 process.env.CUSTOM_API_KEY,
51 "https://api.custom-provider.com"
52);
53
54await evaluatePrompt({
55 dataset,
56 messages: [{ role: "user", content: "{{input}}" }],
57 model: customModel,
58});

Model Resolution

The SDK automatically resolves models in this order:

  1. If a string is provided: Auto-detects provider and creates appropriate model
  2. If LanguageModel is provided: Uses the instance directly
  3. If OpikBaseModel is provided: Uses the custom implementation
  4. If undefined: Defaults to "gpt-4o"
1// String → Auto-detected as OpenAI
2model: "gpt-4o";
3
4// LanguageModel → Used directly
5import { openai } from "@ai-sdk/openai";
6model: openai("gpt-4o");
7
8// Custom implementation
9model: new CustomProviderModel("my-model", apiKey, baseUrl);
10
11// Undefined → Defaults to "gpt-4o"
12model: undefined;

Best Practices

1. Use Model ID Strings for Simplicity

For most use cases, use model ID strings directly:

1import { Hallucination } from "opik";
2
3const metric = new Hallucination({ model: "gpt-4o" });

The Opik SDK handles model configuration internally for optimal evaluation performance.

2. Match Model Capabilities to Task

Choose models based on task requirements:

1// Complex reasoning: GPT-4o, Claude Sonnet
2model: "gpt-4o";
3model: "claude-3-5-sonnet-latest";
4
5// Fast responses: GPT-4o-mini, Gemini Flash
6model: "gpt-4o-mini";
7model: "gemini-2.0-flash";
8
9// Long context: Claude, Gemini
10model: "claude-3-5-sonnet-latest"; // 200K context
11model: "gemini-1.5-pro"; // 1M context

3. Use Different Models for Tasks and Metrics

Optimize costs by using different models:

1await evaluatePrompt({
2 dataset,
3 messages: [{ role: "user", content: "{{input}}" }],
4 model: "gpt-4o-mini", // Cheaper for generation
5 scoringMetrics: [
6 new Hallucination({ model: "gpt-4o" }), // More accurate for evaluation
7 ],
8});

4. Configure API Keys

Set up environment variables for each provider:

$# OpenAI
>export OPENAI_API_KEY="sk-..."
>
># Anthropic
>export ANTHROPIC_API_KEY="sk-ant-..."
>
># Google
>export GOOGLE_API_KEY="..."

5. Handle Rate Limits

Use appropriate worker counts to avoid rate limits:

1await evaluatePrompt({
2 dataset,
3 messages: [{ role: "user", content: "{{input}}" }],
4 model: "gpt-4o",
5 taskWorkers: 5, // Limit parallel requests
6});

Troubleshooting

API Key Not Found

1// Error: API key not found for provider
2
3// Solution: Set environment variable
4process.env.OPENAI_API_KEY = "sk-...";

Model Not Supported

1// Error: Unsupported model ID
2
3// Solution: Use custom implementation
4class MyModel extends OpikBaseModel {
5 // ... implementation
6}

Rate Limit Errors

1// Error: Rate limit exceeded
2
3// Solution: Reduce worker count
4await evaluatePrompt({
5 dataset,
6 messages: [{ role: "user", content: "{{input}}" }],
7 model: "gpt-4o",
8 taskWorkers: 3, // Reduce from default 10
9});

See Also