Opik provides a set of LLM as a Judge metrics that are designed to be model-agnostic and can be used with any LLM. In order to achieve this, we use the LiteLLM library to abstract the LLM calls.
By default, Opik will use the gpt-5-nano model. However, you can change this by setting the model parameter when initializing your metric to any model supported by LiteLLM:
In order to use many models supported by LiteLLM, you also need to pass additional parameters. For this, you can use the LiteLLMChatModel class and passing it to the metric:
Many LLM providers (such as SiliconFlow, Together AI, Groq, and others) expose APIs that are compatible with the OpenAI API format. You can use these providers with Opik’s LLM-as-a-Judge metrics by using LiteLLM’s openai/ provider prefix and setting the appropriate environment variables.
This is a simpler alternative to creating a custom model class when your provider already supports the OpenAI API format.
Set OPENAI_API_KEY to your provider’s API key and OPENAI_BASE_URL to the provider’s API endpoint, then use the openai/ prefix when specifying the model name:
The openai/ prefix tells LiteLLM to use the OpenAI-compatible API format with the configured base URL. This approach works with any metric that accepts a model parameter, including Hallucination, Moderation, AnswerRelevance, and others.
For the full list of supported providers and configuration options, see the LiteLLM OpenAI-compatible providers documentation.
Opik’s LLM-as-a-Judge metrics, such as Hallucination, are designed to work with various language models. While Opik supports many models out-of-the-box via LiteLLM, you can integrate any LLM by creating a custom model class. This involves subclassing opik.evaluation.models.OpikBaseModel and implementing its required methods.
OpikBaseModel InterfaceOpikBaseModel is an abstract base class that defines the interface Opik metrics use to interact with LLMs. To create a compatible custom model, you must implement the following methods:
__init__(self, model_name: str):
Initializes the base model with a given model name.generate_string(self, input: str, **kwargs: Any) -> str:
Simplified interface to generate a string output from the model.generate_provider_response(self, **kwargs: Any) -> Any:
Generate a provider-specific response. Can be used to interface with the underlying model provider (e.g., OpenAI, Anthropic) and get raw output.Here’s an example of a custom model class that interacts with an LLM service exposing an OpenAI-compatible API endpoint.
Key considerations for the implementation:
base_url and the JSON payload to match your specific LLM provider’s
requirements if they deviate from the common OpenAI structure.model_name passed to __init__ is used as the model parameter in the API call. Ensure this matches an available model on your LLM service.Hallucination MetricIn order to run an evaluation using your Custom Model with the Hallucination metric,
you will first need to instantiate our CustomOpenAICompatibleModel class and pass it to the Hallucination class.
The evaluation can then be kicked off by calling the Hallucination.score()` method.
Key considerations for the implementation:
Hallucination.score() returns a ScoreResult object containing the metric name (name), score value (value), optional explanation (reason), metadata (metadata), and a failure flag (scoring_failed).The TypeScript SDK integrates seamlessly with the Vercel AI SDK, allowing you to use language models directly with Opik’s evaluation metrics. For comprehensive model configuration including supported providers, generation parameters, and advanced settings, see the Models Reference.
For unsupported LLM providers, implement the OpikBaseModel interface:
Once implemented, use your custom model like any other:
When implementing custom models:
generateString() and generateProviderResponse() methodsFor standard model usage and configuration, refer to the Models Reference.