Multimodal evaluations
Opik lets you evaluate multimodal prompts that combine text and images. You can run these experiments straight from the UI, or by using the SDKs. This page covers both flows, clarifies which models support image inputs, and explains how to customise model detection.
Online evaluation in the UI
LLM-as-a-Judge experiments in the Opik UI accept image attachments on both the dataset rows and the prompt messages. When you configure an evaluation:
- Open Evaluations → LLM-as-a-Judge and click New evaluation.
- Choose a vision-capable model (for example, gpt-4oorclaude-3-5-sonnet).
- Add image URLs or upload files in either the Dataset rows or the Prompt builder.
- Launch the evaluation. Opik automatically keeps the images in the judge prompt when the selected model supports them. If the model does not support images, the UI surfaces a warning and flattens the image reference into a text placeholder so the run still completes.
All multimodal traces appear in the evaluation results, so you can inspect exactly what the judge model received.
Using the SDKs
Both the Python and TypeScript SDKs accept OpenAI-style message payloads. Each message can contain either a string or a list of content blocks. Image blocks use the image_url type and can point to an https:// URL or a data:image/...;base64, payload.
Python example
The evaluator uses LiteLLM-style model identifiers. Opik recognises popular multimodal families (OpenAI GPT-4o, Anthropic Claude 3+, Google Gemini 1.5, Meta Llama 3.2 Vision, Mistral Pixtral, etc.) and treats any model whose name ends with -vision or -vl as vision-capable. Provider prefixes such as anthropic/ are stripped automatically. When a model is not recognised as vision-capable, Opik logs a warning and replaces image blocks with placeholders before making the API call.
TypeScript note
TypeScript support for multimodal evaluations is in progress. The TypeScript SDK will expose the same message structure and detection rules; we’ll update this section with a full example once the implementation lands.
Customising model support
If you are experimenting with a new provider, you can extend the registry at runtime:
Any subsequent evaluations in that process will treat the custom model as vision-capable.
FAQ
How do I confirm whether a model supports images?
If the call returns False, Opik will log a warning and flatten image blocks. The data is inserted as text and truncated to the first 500 characters to keep prompts manageable.
What kind of image sources can I use?
- Direct https://URLs (publicly accessible).
- Base64 data URLs such as ....
- Optional OpenAI detailfields ("low","high") are preserved and forwarded when present.
Does this work with LangChain integrations?
Yes. Opik forwards the same OpenAI-style content blocks that LangChain expects, so structured messages with image_url dictionaries continue to work. A simple validation script is shown below: