Getting Started

Here are the most relevant improvements we’ve made since the last release:

🚨 Native Slack and PagerDuty Alerts

We now offer native Slack and PagerDuty alert integrations, eliminating the need for any middleware configuration. Set up alerts directly in Opik to receive notifications when important events happen in your workspace.

With native integrations, you can:

  • Configure Slack channels directly from Opik settings
  • Set up PagerDuty incidents without additional webhook setup
  • Receive real-time notifications for errors, feedback scores, and critical events
  • Streamline your monitoring workflow with built-in integrations
Create alert form

👉 Read the full docs here - Alerts Guide

🖼️ Multimodal LLM-as-a-Judge Support for Visual Evaluation

LLM as a Judge metrics can now evaluate traces that contain images when using vision-capable models. This is useful for:

  • Evaluating image generation quality - Assess the quality and relevance of generated images
  • Analyzing visual content in multimodal applications - Evaluate how well your application handles visual inputs
  • Validating image-based responses - Ensure your vision models produce accurate and relevant outputs

To reference image data from traces in your evaluation prompts:

  • In the prompt editor, click the “Images +” button to add an image variable
  • Map the image variable to the trace field containing image data using the Variable Mapping section

👉 Read more: Evaluating traces with images

✨ Prompt Generator & Improver

We’ve launched the Prompt Generator and Prompt Improver — two AI-powered tools that help you create and refine prompts faster, directly inside the Playground.

Designed for non-technical users, these features automatically apply best practices from OpenAI, Anthropic, and Google, helping you craft clear, effective, and production-grade prompts without leaving the Playground.

Why it matters

Prompt engineering is still one of the biggest bottlenecks in LLM development. With these tools, teams can:

  • Generate high-quality prompts from simple task descriptions
  • Improve existing prompts for clarity, specificity, and consistency
  • Iterate and test prompts seamlessly in the Playground

How it works

  • Prompt Generator → Describe your task in plain language; Opik creates a complete system prompt following proven design principles
  • Prompt Improver → Select an existing prompt; Opik enhances it following best practices

👉 Read the full docs: Prompt Generator & Improver

🔗 Advanced Prompt Integration in Spans & Traces

We’ve implemented prompt integration into spans and traces, creating a seamless connection between your Prompt Library, Traces, and the Playground.

You can now associate prompts directly with traces and spans using the opik_context module — so every execution is automatically tied to the exact prompt version used.

Understanding which prompt produced a given trace is key for users building both simple and advanced multi-prompt and multi-agent systems.

With this integration, you can:

  • Track which prompt version was used in each function or span
  • Audit and debug prompts directly from trace details
  • Reproduce or improve prompts instantly in the Playground
  • Close the loop between prompt design, observability, and iteration

Once added, your prompts appear in the trace details view — with links back to the Prompt Library and the Playground, so you can iterate in one click.

👉 Read more: Adding prompts to traces and spans

🧪 Better No-Code Experiment Capabilities in the Playground

We’ve introduced a series of improvements directly in the Playground to make experimentation easier and more powerful:

Key enhancements:

  1. Create or select datasets directly from the Playground
  2. Create or select online score rules - Ability to choose the ones that you want to use on each run
  3. Ability to pass dataset items to online score rules - This enables reference-based experiments, where outputs are automatically compared to expected answers or ground truth, making objective evaluation simple
  4. One-click navigation to experiment results - From the Playground, users can now:
    • Jump into the Single Experiment View to inspect metrics and examples in detail, or
    • Go to the Compare Experiments View to benchmark multiple runs side-by-side

📊 On-Demand Online Evaluation on Existing Traces and Threads

We’ve added on-demand online evaluation in Opik, letting users run metrics on already logged traces and threads — perfect for evaluating historical data or backfilling new scores.

How it works

Select traces/threads, choose any online score rule (e.g., Moderation, Equals, Contains), and run evaluations directly from the UI — no code needed.

Results appear inline as feedback scores and are fully logged for traceability.

This enables:

  • Fast, no-code evaluation of existing data
  • Easy retroactive measurement of model and agent performance
  • Historical data analysis without re-running traces

👉 Read more: Manual Evaluation

🤖 Agent Evaluation Guides

We’ve added two new comprehensive guides on evaluating agents:

1. Evaluating Agent Trajectories

This guide helps you evaluate that your agent is making the right tool calls before returning the final answer. It’s fundamentally about evaluating and scoring what is happening within a trace.

👉 Read the full guide: Evaluating Agent Trajectories

2. Evaluating Multi-Turn Agents

Evaluating chatbots is tough because you need to evaluate not just a single LLM response but instead a conversation. This guide walks you through how you can use the new opik.simulation.SimulatedUser method to create simulated threads for your agent.

👉 Read the full guide: Evaluating Multi-Turn Agents

These new docs significantly strengthen our agent evaluation feature-set and include diagrams to visualize how each evaluation strategy works.

📦 Import/Export Commands

Added new command-line functions for importing and exporting Opik data: you can now export all traces, spans, datasets, prompts, and evaluation rules from a project to local JSON or CSV files. Also helps you import data from local JSON files into an existing project.

Top use cases it is useful for

  • Migrate - Move data between projects or environments
  • Backup - Create local backups of your project data
  • Version control - Track changes to your prompts and evaluation rules
  • Data portability - Easily transfer your Opik workspace data

Read the full docs: Import/Export Commands


And much more! 👉 See full commit log on GitHub

Releases: 1.8.83, 1.8.84, 1.8.85, 1.8.86, 1.8.87, 1.8.88, 1.8.89, 1.8.90, 1.8.91, 1.8.92, 1.8.93, 1.8.94, 1.8.95, 1.8.96, 1.8.97


Here are the most relevant improvements we’ve made since the last release:

🚨 Alerts

We’ve launched Alerts — a powerful way to get automated webhook notifications from your Opik workspace whenever important events happen (errors, feedback scores, prompt changes, and more). Opik now sends an HTTP POST to your endpoint with rich, structured event data you can route anywhere.

Now, you can make Opik a seamless part of your end-to-end workflows! With the new Alerts you can:

  • Spot production errors in near-real time
  • Track feedback scores to monitor model quality and user satisfaction
  • Audit prompt changes across your workspace
  • Funnel events into your existing workflows and CI/CD pipelines

And this is just v1.0! We’ll keep adding events and advanced filtering, thresholds and more fine-grained control in future iterations, always based on community feedback.

Alerts configuration interface showing webhook setup and event types

Read the full docs here - Alerts Guide

🖼️ Expanded Multimodal Image Support

We’ve added a better image support across our platform!

What’s new?

1. Image Support in LLM as a Judge online Evaluations - LLM as a Judge evaluations now support images alongside text, enabling you to evaluate vision models and multimodal applications. Upload images and get comprehensive feedback on both text and visual content.

2. Enhanced Playground Experience - The playground now supports image inputs, allowing you to test prompts with images before running full evaluations. Perfect for experimenting with vision models and multimodal prompts.

3. Improved Data Display - Base64 image previews in data tables, better image handling in trace views, and enhanced pretty formatting for multimodal content.

LLM Judge evaluation interface showing image support for multimodal evaluations

Links to official docs: Evaluating traces with images and Using images in the Plaground

Opik Optimizer Updates

1. Support Multi-Metric Optimization - Support for optimizing multiple metrics simultaneously with comprehensive frontend and backend changes. Read more

2. Hierarchical Reflective Optimizer - New optimizer with self-reflective capabilities. Read more about it here

Enhanced Feedback & Annotation experience

1. Improved Annotation Queue Export - Enhanced export functionality for annotation queues: export your annotated data seamlessly for further analysis.

2. Annotation Queue UX Enhancements

  • Hotkeys Navigation - Improved keyboard navigation throughout the interface for a fast annotation experience
  • Return to Annotation Queue Button - Easy navigation back to annotation queues
  • Resume Functionality - Continue annotation work where you left off
  • Queue Creation from Traces - Create annotation queues directly from trace tables

3. Inline Feedback Editing - Quickly edit user feedback directly in data tables with our new inline editing feature. Hover over feedback cells to reveal edit options, making annotation workflows faster and more intuitive.

Inline feedback editing interface showing hover-triggered edit options in data tables

Read more about our Annotation Queues

User Experience Enhancements

1. Dark Mode Refinements - Improved dark mode styling across UI components for better visual consistency and user experience.

Dark mode interface showing improved styling and visual consistency across UI components

2. Enhanced Prompt Readability - Better formatting and display of long prompts in the interface, making them easier to read and understand.

3. Improved Online Evaluation Page - Added search, filtering, and sorting capabilities to the online evaluation page for better data management.

4. Better token and cost control

  • Thread Cost Display - Show cost information in thread sidebar headers
  • Sum Statistics - Display sum statistics for cost and token columns in the traces table.
Total cost display showing cost information in thread sidebar headers and sum statistics
Total duration display showing duration statistics and timing information

5. Filter-Aware Metric Aggregation - Better experiment item filtering in the experiments details tables for better data control.

6. Pretty Mode Enhancements - Improved the Pretty mode for Input/Output display with better formatting and readability across the product.

TypeScript SDK Updates

  • Opik Configure Tool - New opik-ts configure tool with a guided developer experience and local flag support
  • Prompt Management - Comprehensive prompt management implementation
  • LangChain Integration - Aligned LangChain integration with Python architecture

Python SDK Improvements

  • Context Managers - New context managers for span and trace creation
  • Bedrock Integration - Enhanced Bedrock integration with invoke_model support
  • Trace Updates - New update_trace() method for easier trace modifications
  • Parallel Agent Support - Support for logging parallel agents in ADK integration
  • Enhanced feedback score handling with better category support

Integration updates

1. OpenTelemetry Improvements

  • Thread ID Support - Added support for thread_id in OpenTelemetry endpoint
  • System Information in Telemetry - Enhanced telemetry with system information

2. Model Support Updates - Added support for Claude Haiku 4.5 and updated model pricing information across the platform.

And much more! 👉 See full commit log on GitHub

Releases: 1.8.63, 1.8.64, 1.8.65, 1.8.66, 1.8.67, 1.8.68, 1.8.69, 1.8.70, 1.8.71, 1.8.72, 1.8.73, 1.8.74, 1.8.75, 1.8.76, 1.8.77, 1.8.78, 1.8.79, 1.8.80, 1.8.81, 1.8.82, 1.8.83


Here are the most relevant improvements we’ve made since the last release:

📝 Multi-Value Feedback Scores & Annotation Queues

We’re excited to announce major improvements to our evaluation and annotation capabilities!

What’s new?

1. Multi-Value Feedback Scores Multiple users can now independently score the same trace or thread. No more overwriting each other’s input—every reviewer’s perspective is preserved and is visible in the product. This enables richer, more reliable consensus-building during evaluation.

2. Annotation Queues Create queues of traces or threads that need expert review. Share them with SMEs through simple links. Organize work systematically, track progress, and collect both structured and unstructured feedback at scale.

3. Simplified Annotation Experience A clean, focused UI designed for non-technical reviewers. Support for clear instructions, predefined feedback metrics, and progress indicators. Lightweight and distraction-free, so SMEs can concentrate on providing high-quality feedback.

Annotation Queues interface showing SME workflow and feedback collection

Full Documentation: Annotation Queues

🚀 Opik Optimizer - GEPA Algorithm & MCP Tool Optimization

What’s new?

1. GEPA (Genetic-Pareto) Support GEPA is the new algorithm for optimizing prompts from Stanford. This bolsters our existing optimizers with the latest algorithm to give users more options.

2. MCP Tool Calling Optimization The ability to tune MCP servers (external tools used by LLMs). Our solution uses our existing algorithm (MetaPrompter) to use LLMs to tune how LLMs interact with an MCP tool. The final output is a new tool signature which you can commit back to your code.

GEPA Optimizer interface showing genetic-pareto algorithm for prompt optimization

Full Documentation: Tool Optimization | GEPA Optimizer

🔍 Dataset & Search Enhancements

  • Added dataset search and dataset items download functionality

🐍 Python SDK Improvements

  • Implement granular support for choosing dataset items in experiments
  • Better project name setting and onboarding
  • Implement calculation of mean/min/max/std for each metric in experiments
  • Update CrewAI to support CrewAI flows

🎨 UX Enhancements

  • Add clickable links in trace metadata
  • Add description field to feedback definitions

And much more! 👉 See full commit log on GitHub

Releases: 1.8.43, 1.8.44, 1.8.45, 1.8.46, 1.8.47, 1.8.48, 1.8.49, 1.8.50, 1.8.51, 1.8.52, 1.8.53, 1.8.54, 1.8.55, 1.8.56, 1.8.57, 1.8.58, 1.8.59, 1.8.60, 1.8.61, 1.8.62


Here are the most relevant improvements we’ve made since the last release:

🔍 Opik Trace Analyzer Beta is Live!

We’re excited to announce the launch of Opik Trace Analyzer on Opik Cloud!

What this means: faster debugging & analysis!

Our users can now easily understand, analyze, and debug their development and production traces.

Want to give it a try? All you need to do is go to one of your traces and click on “Inspect trace” to start getting valuable insights.

Opik Trace Analyzer Beta interface showing trace analysis and debugging features

✨ Features and Improvements

  • We’ve finally added dark mode support! This feature has been requested many times by our community members. You can now switch your theme in your account settings.
Dark mode theme toggle in Opik account settings showing light and dark theme options
  • Now you can filter the widgets in the metrics tab by trace and threads attributes
Metrics tab filters showing trace and thread attribute filtering options
  • Annotating tons of threads? We’ve added the ability to export feedback score comments for threads to CSV for easier analysis in external tools.
  • We have also improved the discoverability of the experiment comparison feature.
  • Added new filter operators to the Experiments table
Experiment table filter operators showing advanced filtering options
  • Adding assets as part of your experiment’s metadata? We now display clickable links in the experiment config tab for easier navigation.
Clickable assets and metadata links in experiment configuration tab showing improved navigation

📚 Documentation

  • We’ve released Opik University! This is a new section of the docs full of video guides explaining the product.
Opik University documentation section showing video guides and tutorials

🔌 SDK & Integration Improvements

And much more! 👉 See full commit log on GitHub

Releases: 1.8.34, 1.8.35, 1.8.36, 1.8.37, 1.8.38, 1.8.39, 1.8.40, 1.8.41, 1.8.42


Here are the most relevant improvements we’ve made in the last couple of weeks:

🧪 Experiment Grouping

Instantly organize and compare experiments by model, provider, or custom metadata to surface top performers, identify slow configurations, and discover winning parameter combinations. The new Group by feature provides aggregated statistics for each group, making it easier to analyze patterns across hundreds of experiments.

Experiment Grouping Interface

🤖 Expanded Model Support

Added support for 144+ new models, including:

  • OpenAI’s GPT-5 and GPT-4.1-mini
  • Anthropic Claude Opus 4.1
  • Grok 4
  • DeepSeek v3
  • Qwen 3

🛫 Streamlined Onboarding

New quick start experience with AI-assisted installation, interactive setup guides, and instant access to team collaboration features and support.

New Onboarding Experience

🔌 Integrations

Enhanced support for leading AI frameworks including:

  • LangChain: Improved token usage tracking functionality
  • Bedrock: Comprehensive cost tracking for Bedrock models

🔍 Custom Trace Filters

Advanced filtering capabilities with support for list-like keys in trace and span filters, enabling precise data segmentation and analysis across your LLM operations.

⚡ Performance Optimizations

  • Python scoring performance improvements with pre-warming
  • Optimized ClickHouse async insert parameters
  • Improved deduplication for spans and traces in batches

🛠️ SDK Improvements

  • Python SDK configuration error handling improvements
  • Added dataset & dataset item ID to evaluate task inputs
  • Updated OpenTelemetry integration

And much more! 👉 See full commit log on GitHub

Releases: 1.8.16, 1.8.17, 1.8.18, 1.8.19, 1.8.20, 1.8.21, 1.8.22, 1.8.23, 1.8.24, 1.8.25, 1.8.26, 1.8.27, 1.8.28, 1.8.29, 1.8.30, 1.8.31, 1.8.32, 1.8.33


🎯 Advanced Filtering & Search Capabilities

We’ve expanded filtering and search capabilities to help you find and analyze data more effectively:

  • Custom Trace Filters: Support for custom filters on input/output fields for traces and spans, allowing more precise data filtering
  • Enhanced Search: Improved search functionality with better result highlighting and local search within code blocks
  • Better Search Results: Enhanced search result highlighting and improved local search functionality within code blocks
  • Crash Filtering: Fixed filtering issues for values containing special characters like % to prevent crashes
  • Dataset Filtering: Added support for experiments filtering by datasetId and promptId
Filtering and Search Interface

📊 Metrics & Analytics Improvements

We’ve enhanced the metrics and analytics capabilities:

  • Thread Feedback Scores: Added comprehensive thread feedback scoring system for better conversation quality assessment
  • Thread Duration Monitoring: New duration widgets in the Metrics dashboard for monitoring conversation length trends
  • Online Evaluation Rules: Added ability to enable/disable online evaluation rules for more flexible monitoring
  • Cost Optimization: Reduced cost prompt queries to improve performance and reduce unnecessary API calls

🎨 UX Enhancements

We’ve made several UX improvements to make the platform more intuitive and efficient:

  • Full-Screen Popup Improvements: Enhanced the full-screen popup experience with better navigation and usability
  • Tag Component Optimization: Made tag components smaller and more compact for better space utilization
  • Column Sorting: Enabled sorting and filtering on all Prompt columns for better data organization
  • Multi-Item Tagging: Added ability to add tags to multiple items in the Traces and Spans tables simultaneously

🔌 SDK, integrations and docs

  • LangChain Integration: Enhanced LangChain integration with improved provider and model logging
  • Google ADK Integration: Updated Google ADK integration with better graph building capabilities
  • Bedrock Integration: Added comprehensive cost tracking support for ChatBedrock and ChatBedrockConverse

🔒 Security & Stability Enhancements

We’ve implemented several security and stability improvements:

  • Dependency Updates: Updated critical dependencies including MySQL connector, OpenTelemetry, and various security patches
  • Error Handling: Improved error handling and logging across the platform
  • Performance Monitoring: Enhanced NewRelic support for better performance monitoring
  • Sentry Integration: Added more metadata about package versions to Sentry events for better debugging

And much more! 👉 See full commit log on GitHub

Releases: 1.8.7, 1.8.8, 1.8.9, 1.8.10, 1.8.11, 1.8.12, 1.8.13, 1.8.14, 1.8.15, 1.8.16


🧵 Thread-level LLMs-as-Judge

We now support thread-level LLMs-as-a-Judge metrics!

We’ve implemented Online evaluation for threads, enabling the evaluation of entire conversations between humans and agents.

This allows for scalable measurement of metrics such as user frustration, goal achievement, conversational turn quality, clarification request rates, alignment with user intent, and much more.

We’ve also implemented Python metrics support for threads, giving you full code control over metric definitions.

Thread Online Score Interface

To improve visibility into trends and to help detect spikes in these metrics when the agent is running in production, we’ve added Thread Feedback Scores and Thread Duration widgets to the Metrics dashboard. These additions make it easier to monitor changes over time in live environments.

Thread Metrics Interface

🔍 Improved Trace Inspection Experience

Once you’ve identified problematic sessions or traces, we’ve made it easier to inspect and analyze them with the following improvements:

  • Field Selector for Trace Tree: Quickly choose which fields to display in the trace view.
  • Span Type Filter: Filter spans by type to focus on what matters.
  • Improved Agent Graph: Now supports full-page view and zoom for easier navigation.
  • Free Text Search: Search across traces and spans freely without constraints.
  • Better Search Usability: search results are now highlighted and local search is available within code blocks.
Thread Improvements Interface

📊 Spans Tab Improvements

The Spans tab provides a clearer, more comprehensive view of agent activity to help you analyze tool and sub-agent usage across threads, uncover trends, and spot latency outliers more easily.

What’s New:

  • LLM Calls → Spans: we’ve renamed the LLM Calls tab to Spans to reflect broader coverage and richer insights.
  • Unified View: see all spans in one place, including LLM calls, tools, guardrails, and more.
  • Span Type Filter: quickly filter spans by type to focus on what matters most.
  • Customizable Columns: highlight key span types by adding them as dedicated columns.

These improvements make it faster and easier to inspect agent behavior and performance at a glance.

Spans Table Filter Interface

📈 Experiments Improvements

Slow model response times can lead to frustrating user experiences and create hidden bottlenecks in production systems. However, identifying latency issues early (during experimentation) is often difficult without clear visibility into model performance.

To help address this, we’ve added Duration as a key metric for monitoring model latency in the Experiments engine. You can now include Duration as a selectable column in both the Experiments and Experiment Details views. This makes it easier to identify slow-responding models or configurations early, so you can proactively address potential performance risks before they impact users.

Experiment Duration Interface

📦 Enhanced Data Organization & Tagging

When usage grows and data volumes increase, effective data management becomes crucial. We’ve added several capabilities to make team workflows easier:

  • Tagging, filtering, and column sorting support for Prompts
  • Tagging, filtering, and column sorting support for Datasets
  • Ability to add tags to multiple items in the Traces and Spans tables

🤖 New Models Support

We’ve added support for:

  • OpenAI GPT-4.1 and GPT-4.1-mini models
  • Anthropic Claude 4 Sonnet model

🌐 Integration Updates

We’ve enhanced several integrations:

  • Build graph for Google ADK agents
  • Update Langchain integration to log provider, model and usage when using Google Generative AI models
  • Implement Groq LLM usage tracking support in the Langchain integration

And much more! 👉 See full commit log on GitHub

Releases: 1.8.0, 1.8.1, 1.8.2, 1.8.3, 1.8.4, 1.8.5, 1.8.6


🛠 Agent Optimizer 1.0 released!

The Opik Agent Optimizer now supports full agentic systems and not just single prompts.

With support for LangGraph, Google ADK, PydanticAI, and more, this release brings a simplified API, model customization for evaluation, and standardized interfaces to streamline optimization workflows. Learn more in the docs.

🧵 Thread-level improvements

Added Thread-Level Feedback, Tags & Comments: You can now add expert feedback scores directly at the thread level, enabling SMEs to review full agent conversations, flag risks, and collaborate with dev teams more effectively. Added support for thread-level tags and comments to streamline workflows and improve context sharing.

🖥️ UX improvements

  • We’ve redesigned the Opik Home Page to deliver a cleaner, more intuitive first-use experience, with a focused value proposition, direct access to key metrics, and a polished look. The demo data has also been upgraded to showcase Opik’s capabilities more effectively for new users. Additionally, we’ve added inter-project comparison capabilities for metrics and cost control, allowing you to benchmark and monitor performance and expenses across multiple projects.
  • Improved Error Visualization: Enhanced how span-level errors are surfaced across the project. Errors now bubble up to the project view, with quick-access shortcuts to detailed error logs and variation stats for better debugging and error tracking.

  • Improved Sidebar Hotkeys: Updated sidebar hotkeys for more efficient keyboard navigation between items and detail views.

🔌 SDK, integrations and docs

  • Added Langchain support in metric classes, allowing use of Langchain as a model proxy alongside LiteLLM for flexible LLM judge customization.
  • Added support for the Gemini 2.5 model family.
  • Updated pretty mode to support Dify and LangGraph + OpenAI responses.
  • Added the OpenAI agents integration cookbook (link).
  • Added a cookbook on how to import Huggingface Datasets to Opik

👉 See full commit log on GitHub

Releases: 1.7.37, 1.7.38, 1.7.39, 1.7.40, 1.7.41, 1.7.42


🔌 Integrations and SDK

  • Added CloudFlare’s WorkersAI integration (docs)
  • Google ADK integration: tracing is now automatically propagated to all sub-agents in agentic systems with the new track_adk_agent_recursive feature, eliminating the need to manually add tracing to each sub-agent.
  • Google ADK integration: now we retrieve session-level information from the ADK framework to enrich the threads data.
  • New in the SDK! Real-time tracking for long-running spans/traces is now supported. When enabled (set os.environ["OPIK_LOG_START_TRACE_SPAN"] = "True" in your environment), you can see traces and spans update live in the UI—even for jobs that are still running. This makes debugging and monitoring long-running agents much more responsive and convenient.

🧵 Threads improvements

  • Added Token Count and Cost Metrics in Thread table
  • Added Sorting on all Thread table columns
  • Added Navigation from Thread Detail to all related traces
  • Added support for “pretty mode” in OpenAI Agents threads

🧪 Experiments improvements

  • Added support for filtering by configuration metadata to experiments. It is now also possible to add a new column displaying the configuration in the experiments table.

🛠 Agent Optimizer improvements

  • New Public API for Agent Optimization
  • Added optimization run display link
  • Added optimization_context

🛡️ Security Fixes

  • Fixed: h11 accepted some malformed Chunked-Encoding bodies
  • Fixed: setuptools had a path traversal vulnerability in PackageIndex.download that could lead to Arbitrary File Write
  • Fixed: LiteLLM had an Improper Authorization Vulnerability

👉 See full commit log on GitHub

Releases: 1.7.32, 1.7.33, 1.7.34, 1.7.35, 1.7.36


💡 Product Enhancements

  • Ability to upload CSV datasets directly through the user interface
  • Add experiment cost tracking to the Experiments table
  • Add hinters and helpers for onboarding new users across the platform
  • Added “LLM calls count” to the traces table
  • Pretty formatting for complex agentic threads
  • Preview support for MP3 files in the frontend

🛠 SDKs and API Enhancements

  • Good news for JS developers! We’ve released experiments support for the JS SDK (official docs coming very soon)
  • New Experiments Bulk API: a new API has been introduced for logging Experiments in bulk.
  • Rate Limiting improvements both in the API and the SDK

🔌 Integrations

  • Support for OpenAI o3-mini and Groq models added to the Playground
  • OpenAI Agents: context awareness implemented and robustness improved. Improve thread handling
  • Google ADK: added support for multi-agent integration
  • LiteLLM: token and cost tracking added for SDK calls. Integration now compatible with opik.configure(…)

👉 See full commit log on GitHub

Releases: 1.7.27, 1.7.28, 1.7.29, 1.7.30, 1.7.31