PydanticAI and Logfire Trace Fidelity Improvements
Traces from PydanticAI agents ingested over OpenTelemetry (via Logfire) now show complete, correctly-typed data and accurate cost. Several gaps in the OTel → Opik span mapping were fixed: Tool call…
Quick MCP Server Setup via opik configure
Setting up Claude Desktop (or any MCP client) to connect to Opik no longer requires manually editing JSON config files. Running opik configure now offers an opt-in MCP setup step at the end of the…
Bug Fixes & Improvements
OpenClaw added to onboarding integrations — The Get Started page now includes OpenClaw with its four-step CLI setup guide (install plugin → configure → status → run), rendered with bash syntax…
Resume Interrupted Evaluations with evaluate_resume
Long-running evaluation jobs that get cut short — by Ctrl-C, an OOM error, a failed scoring metric, or a network blip — can now be continued from where they stopped instead of restarting from…
OpenAI Responses API Support in Playground and LLM-as-a-Judge
The Playground and LLM-as-a-Judge now support OpenAI's /v1/responses API, making it possible to use o-series reasoning models (o1, o3, o3-mini, o4-mini) and other deployments that are only available…
Bug Fixes & Improvements
Annotation queues: claim mechanism for parallel annotation — Multiple annotators working the same queue simultaneously now see each item locked while another reviewer is looking at it, preventing…
Performance Improvements
Span timestamp filters use ClickHouse skip indexes — created_at and last_updated_at on the spans and traces tables now have minmax skip indexes. Range filters on these columns prune granules instead…
Prompt Library Now Available in Opik 2.0
The Prompt Library is now part of the Opik 2.0 UI, accessible from the project sidebar under Prompt library. Alongside that, prompt versions have gained first-class environment support — you can tag…
Simplified Filters in the Logs View
The Traces, Spans, and Threads tabs now have a redesigned filter bar that makes it faster to narrow down what you're looking at. Filters appear as chips directly in the toolbar — pick a field, set a…
Bug Fixes & Improvements
Test suite assertions: sub-span inspection — the evaluator LLM can now issue get_trace_spans and read tool calls to inspect intermediate spans during evaluation, enabling correctness checks about…
AND/OR Condition Grouping in Alerts
Alert rules now support structured condition grouping: conditions within a group are evaluated with AND, while groups themselves are combined with OR. This makes it possible to express logic such as…
Bug Fixes & Improvements
Prompt masks (Python & TypeScript SDKs) — prompt_mask_context(masks) / promptMaskContext(masks) lets you run agent code with specific prompt IDs silently redirected to a different version ID,…
🚀 Client-Side Prompt Caching (Python & TypeScript SDKs)
client.get_prompt() and client.get_chat_prompt() now cache results in-process, so repeated calls inside a hot path skip the network round-trip entirely. Pinned commits are cached indefinitely;…
🔌 opik connect CLI Improvements
The opik connect and opik endpoint CLI commands have been reorganized with a much better error experience: Formatted error output — configuration problems now show a labelled card (Reason / Workspace…
🔧 Bug Fixes & Improvements
Playground: Gemma 4 no longer leaks reasoning traces — the internal thinking output from Gemma 4 models was appearing at the top of Playground responses; it is now suppressed so you see only the…
⚡ Performance Improvements
Dataset streaming uses less backend CPU — resolved a query pattern that caused the MySQL reader to scan all dataset versions on every /datasets/items/stream call; under high request volume this was…
🌍 Environment Tracking for Traces, Spans & Threads
You can now tag traces, spans, and threads with an environment field — production, staging, dev, or any label you define. This makes it easy to separate signal from noise: filter your project's trace…
🧪 Test Suite Assertions Can Now Inspect Sub-Spans
Test suite assertions can now look inside a trace — not just the top-level input/output — to reason about tool calls, intermediate LLM steps, and sub-agent behavior. The evaluator LLM gets access to…
⚡ Dramatically Faster Trace Table Loading
Traces and spans tables no longer download attachment bytes (images, PDFs) when loading a list — attachments are lazy-loaded only when you open an individual trace. In our benchmarks with image and…
🤖 OpenAI Playground: Per-Model reasoning_effort Support
The Playground's reasoning_effort control now tracks OpenAI's actual per-model capability matrix. Models like gpt-5.1 that support a "none" option show it; models that don't support reasoning effort…