## Prerequisites
Before you begin, you'll need to choose how you want to use Opik:
* **Opik Cloud**: Create a free account at [comet.com/opik](https://www.comet.com/signup?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=quickstart\&utm_campaign=opik)
* **Self-hosting**: Follow the [self-hosting guide](/self-host/overview) to deploy Opik locally or on Kubernetes
## Logging your first LLM calls
Opik makes it easy to integrate with your existing LLM application, the best way to get started is to use the Opik skill:
## Meet Ollie
Ollie is a conversational AI assistant built into Opik. It lives next to every trace, dataset, experiment, and prompt you've logged — and when you pair your local project with `opik connect`, Ollie can also read your source files, run your agent, and propose code changes. One assistant, full context: your data on one side, your code on the other.
## What Ollie can do
### Let Ollie read your code
Once Ollie identifies *where* the bug lives, it asks to read the relevant source file. With `opik connect` running on your machine, it has secure access to the files in your project — and shows you exactly what it's looking at.
### Approve the fix
Ollie proposes a change. You see the diff, approve it, and the file is updated on your machine. Nothing happens without your click.
### Rerun the agent from Opik
With the fix applied locally, Ollie reruns your agent through `opik connect` using the same inputs from the original failing trace. The new trace streams back into Opik in real time.
### Verify with a test suite
Ask Ollie to add the original trace to a test suite as a regression case, then run the suite against the updated agent. You get a pass/fail summary — and a test that will catch the bug if it ever comes back.
Reviewing these sections can help pinpoint the source of the problem and suggest possible
resolutions.
### ⌨️ Using Comet Debugger Mode (UI/Browser)
**Comet Debugger Mode** is a hidden diagnostic feature in the **Opik web application** that displays real-time technical information to help you troubleshoot issues. This mode is particularly useful when investigating connectivity problems, reporting bugs, or verifying your deployment version.
**To toggle Comet Debugger Mode:**
Press `Command + Shift + .` on macOS or `Ctrl + Shift + .` on Windows/Linux
**What it displays:**
* **Network Status**: Real-time connectivity indicator with RTT (Round Trip Time) showing latency to the Opik backend server in seconds
* **Opik Version**: The current version of Opik you're running (click to copy to clipboard)
This information is helpful when:
* Reporting issues to the Opik team (include the version number and RTT)
* Verifying your Opik version matches expected deployment
* Diagnosing connectivity problems between UI and backend (check RTT for latency issues)
* Troubleshooting UI-related issues or unexpected behavior
* Confirming successful updates or deployments
* Monitoring network performance and latency to the backend server
**How it works:**
The keyboard shortcut toggles the debug information overlay on and off. When enabled, a small
status bar appears in the UI showing the network connectivity status and version information.
The mode persists across browser sessions (stored in local storage), so you only need to enable
it once until you toggle it off again.
## Why use Opik for observability
Debugging LLM applications without observability means guessing. You see the final output but not why the model hallucinated, which retrieval step returned irrelevant context, or where latency spiked.
With Opik, you can:
* **See the full execution path** of every request — from user input through tool calls and LLM completions to the final response
* **Root-cause production issues fast** — filter and search traces by status, latency, cost, or custom tags to find the problem in seconds
* **Track costs and latency over time** — monitor token usage and spending across models and providers
* **Capture multi-turn conversations** — group related traces into threads to understand how interactions evolve across turns
* **Close the feedback loop** — attach human or automated scores to traces and use them to drive evaluations
## What you can capture
You can use [Ollie](/tracing/ollie) to analyze your traces, identify issues in your agent's
behavior, and get actionable suggestions for improvement.
## Next steps
* [Concepts](/tracing/concepts) — Learn about traces, spans, threads, and feedback scores
* [Log traces](/tracing/advanced/log_traces) — In-depth guide on customizing what gets logged
* [Cost tracking](/tracing/advanced/cost_tracking) — Monitor token usage and spending
***
description: Learn about the core concepts of Opik's tracing system, including traces, spans, threads, and how they work together to provide comprehensive observability for your LLM applications.
headline: Concepts | Opik Documentation
og:description: Learn to monitor and optimize LLM applications with Opik's tracing. Understand key concepts to leverage your observability effectively.
og:site_name: Opik Documentation
og:title: Tracing Concepts in Opik - Enhance Observability
subtitle: Understanding the fundamental concepts behind Opik's tracing platform
title: Tracing Core Concepts
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
Opik supports agent observability using our [Typescript SDK](/reference/typescript-sdk/overview),
[Python SDK](/reference/python-sdk/overview), [first class OpenTelemetry support](/integrations/opentelemetry)
and our [REST API](/reference/rest-api/overview).
Integrate with Opik faster using this pre-built prompt
As a next step, you can create an [offline evaluation](/v1/evaluation/evaluate_prompt) to evaluate your
agent's performance on a fixed set of samples.
## Advanced usage
### Using function decorators
Function decorators are a great way to add Opik logging to your existing application. When you add
the `@track` decorator to a function, Opik will create a span for that function call and log the
input parameters and function output for that function. If we detect that a decorated function
is being called within another decorated function, we will create a nested span for the inner
function.
While decorators are most popular in Python, we also support them in our Typescript SDK:
## Understanding Threads
Threads in Opik are collections of traces that are grouped together using a unique `thread_id`. This is particularly useful for:
* **Multi-turn conversations**: Track complete chat sessions between users and AI assistants
* **User sessions**: Group all interactions from a single user session
* **Conversational agents**: Follow the flow of agent interactions and tool usage
* **Workflow tracking**: Monitor complex workflows that span multiple function calls
The `thread_id` is a user-defined identifier that must be unique per project. All traces with the same `thread_id` will be grouped together and displayed as a single conversation thread in the Opik UI.
## Logging conversations
You can log chat conversations by specifying the `thread_id` parameter when using either the low level SDK, Python decorators, or integration libraries:
## Scoring conversations
You can assign conversation-level feedback scores to threads at any time. Threads are aggregated traces
that are created when tracking agents or simply traces interconnected by a `thread_id`.
In the conversation list, you can see the feedback scores associated to each thread.
You can also tag a thread and add comments to it. This is useful to add additional context during the review process or investigate a specific conversation.
### Thread Online Scoring Rule Cooldown Period
For thread-level online evaluation rules (automatic scoring), Opik waits for a "cooldown period" after the last activity
in a thread before running the rules. This gives conversations time to settle before automatic evaluation.
## Logging Attachments
In the Python SDK, you can use the `Attachment` type to add files to your traces.
Attachements can be images, videos, audio files or any other file that you might
want to log to Opik.
Each attachment is made up of the following fields:
* `data`: The path to the file, raw bytes, or a base64 encoded string of the file
* `file_name`: Optional name for the attachment (required when using raw bytes without a file path)
* `content_type`: The content type of the file formatted as a MIME type
These attachements can then be logged to your traces and spans using The
`opik_context.update_current_span` and `opik_context.update_current_trace`
methods:
### Using file paths
The most common way to log attachments is by providing a file path:
```python wordWrap
from opik import opik_context, track, Attachment
@track
def my_llm_agent(input):
# LLM chain code
# ...
# Update the trace with a file path
opik_context.update_current_trace(
attachments=[
Attachment(
data="
## Embedded Attachments
When you embed base64-encoded media directly in your trace/span `input`, `output`, or `metadata` fields, Opik automatically optimizes storage and retrieval for performance.
### How It Works
For base64-encoded content larger than 250KB, Opik automatically extracts and stores it separately. This happens transparently - you don't need to change your code.
When you retrieve your traces or spans later, the attachments are automatically included by default. For faster queries when you don't need the attachment data, use the `strip_attachments=true` parameter.
### Size Limits
Opik Cloud supports embedded attachments up to **100MB per field**. This limit applies to individual string values in your `input`, `output`, or `metadata` fields.
Opik supports logging agent graphs for the following frameworks:
1. LangGraph
2. Google Agent Development Kit (ADK)
3. Manual Tracking
## LangGraph
You can log the agent execution graph by specifying the `graph` parameter in the
[OpikTracer](https://www.comet.com/docs/opik/python-sdk-reference/integrations/langchain/OpikTracer.html) callback:
```python
from opik.integrations.langchain import OpikTracer
opik_tracer = OpikTracer(graph=app.get_graph(xray=True))
```
Opik will log the agent graph definition in the Opik dashboard which you can access by clicking on
`Show Agent Graph` in the trace sidebar.
## Google Agent Development Kit (ADK)
Opik automatically generates visual representations of your agent workflows for Google ADK without requiring any additional configuration. Simply integrate Opik's OpikTracer callback as shown in the [ADK integration configuration guide](https://www.comet.com/docs/opik/integrations/adk#configuring-google-adk), and your agent graphs will be automatically captured and visualized.
The graph automatically shows:
* Agent hierarchy and relationships
* Sequential execution flows
* Parallel processing branches
* Tool connections and dependencies
* Loop structures and iterations
For example, a basic weather and time agent will display its execution flow with all agent steps, LLM calls, and tool invocations:
For more complex multi-agent architectures, the automatic graph visualization becomes even more valuable, providing clear visibility into nested agent hierarchies and complex execution patterns.
## Manual Tracking
You can also log the agent graph definition manually by logging the agent graph definition as a
mermaid graph definition in the metadata of the trace:
```python
import opik
from opik import opik_context
@opik.track
def chat_agent(input: str):
# Update the current trace with the agent graph definition
opik_context.update_current_trace(
metadata={
"_opik_graph_definition": {
"format": "mermaid",
"data": "graph TD; U[User]-->A[Agent]; A-->L[LLM]; L-->A; A-->R[Answer];"
}
}
)
return "Hello, how can I help you today?"
chat_agent("Hi there!")
```
Opik will log the agent graph definition in the Opik dashboard which you can access by clicking on
`Show Agent Graph` in the trace sidebar.
## Next steps
Why not check out:
* [Opik's 50+ integrations](/integrations/overview)
* [Logging traces](/tracing/advanced/log_traces)
* [Evaluating agents](/evaluation/evaluate_agents)
***
headline: Log distributed traces | Opik Documentation
og:description: Learn to track distributed traces in complex LLM applications using Opik's built-in support for multi-service tracing.
og:site_name: Opik Documentation
og:title: Log Distributed Traces with Opik
title: Log distributed traces
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
When working with complex LLM applications, it is common to need to track a traces across multiple services. Opik supports distributed tracing out of the box when integrating using function decorators using a mechanism that is similar to how OpenTelemetry implements distributed tracing.
For the purposes of this guide, we will assume that you have a simple LLM application that is made up of two services: a client and a server. We will assume that the client will create the trace and span, while the server will add a nested span. In order to do this, the `trace_id` and `span_id` will be passed in the headers of the request from the client to the server.

The Python SDK includes some helper functions to make it easier to fetch headers in the client and ingest them in the server:
```python title="client.py"
from opik import track, opik_context
@track()
def my_client_function(prompt: str) -> str:
headers = {}
# Update the headers to include Opik Trace ID and Span ID
headers.update(opik_context.get_distributed_trace_headers())
# Make call to backend service
response = requests.post("http://.../generate_response", headers=headers, json={"prompt": prompt})
return response.json()
```
On the server side, you can pass the headers to your decorated function:
```python title="server.py"
from opik import track
from fastapi import FastAPI, Request
@track()
def my_llm_application():
pass
app = FastAPI() # Or Flask, Django, or any other framework
@app.post("/generate_response")
def generate_llm_response(request: Request) -> str:
return my_llm_application(opik_distributed_trace_headers=request.headers)
```
Once a feedback scores has been provided, you can also add a reason to explain why this particular
score was provided. This is useful to add additional context to the score.
If multiple team members are annotating the same trace, you can see the annotations of each team
member in the UI in the `Feedback scores` section. The average score will be displayed at a trace
and trace level.
Reviewing the health check output can help pinpoint the source of the problem and suggest possible resolutions.
### TypeScript SDK Troubleshooting
#### Configuration Validation Errors
The TypeScript SDK validates configuration at startup. Common errors:
* **"OPIK\_URL\_OVERRIDE is not set"**: Set the `OPIK_URL_OVERRIDE` environment variable
* **"OPIK\_API\_KEY is not set"**: Required for Opik Cloud deployments
* **"OPIK\_WORKSPACE is not set"**: Optional, but can be set for Opik Cloud deployments
#### Debug Logging
Enable debug logging to troubleshoot issues:
```bash
export OPIK_LOG_LEVEL="DEBUG"
```
If you are using the Opik Optimizer SDK, you can also enable optimizer-side debug logs:
```bash
export OPIK_OPTIMIZER_LOG_LEVEL="DEBUG"
```
Or programmatically:
```typescript
import { setLoggerLevel } from "opik";
setLoggerLevel("DEBUG");
```
#### Batch Queue Issues
If data isn't appearing in Opik:
1. **Check if data is batched**: Call `await client.flush()` to force sending
2. **Verify configuration**: Ensure correct API URL and credentials
3. **Check network connectivity**: Verify firewall and proxy settings
### General Troubleshooting
#### Environment Variables Not Loading
1. **Python**: Ensure `load_dotenv()` is called before importing `opik`
2. **TypeScript**: The SDK automatically loads `.env` files
3. **Verify file location**: `.env` file should be in project root
4. **Check file format**: No spaces around `=` in `.env` files
#### Configuration File Issues
1. **File location**: Default is `~/.opik.config`
2. **Custom location**: Use `OPIK_CONFIG_PATH` environment variable
3. **File format**: Python uses TOML, TypeScript uses INI format
4. **Permissions**: Ensure file is readable by your application
***
headline: Offline fallback and message replay | Opik Documentation
og:description: Keep your tracing data intact during network outages with Opik's offline fallback. Learn how failed messages are automatically stored and replayed when connectivity is restored.
og:site_name: Opik Documentation
og:title: Offline Fallback & Message Replay - Opik Python SDK
title: Offline fallback and message replay
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
# Offline fallback and message replay
The Opik Python SDK includes a built-in offline fallback mechanism that protects your tracing data during
network outages. When the SDK cannot reach the Opik server, messages are automatically persisted to a local
SQLite database. Once connectivity is restored, all stored messages are replayed to the server transparently,
with no changes required in your application code.
If you have any feedback or feature requests for dashboards, please [open an issue on GitHub](https://github.com/comet-ml/opik/issues).
## Dashboard types
Every dashboard has a **type** that determines what kind of data it works with and which widgets are available:
| Type | Purpose | Available widgets |
| ----------------- | -------------------------------------------------------------------------- | ------------------------------------ |
| **Multi-project** | Track metrics across one or more projects (traces, threads, cost, latency) | Time series, Single metric, Markdown |
| **Experiments** | Compare feedback scores and results across experiment runs | Metrics, Leaderboard, Markdown |
## Accessing dashboards
### Dashboards page
Access the standalone Dashboards page from the sidebar navigation to create and manage workspace-level dashboards. The dashboards list includes a **Type** column showing whether each dashboard is Multi-project or Experiments.
### Project page — Insights tab
Within any project, the **Insights** tab provides built-in and custom views for monitoring that project's traces, threads, and quality metrics.
### Compare Experiments — Insights tab
When comparing experiments, the **Insights** tab shows a built-in read-only view with experiment comparison charts.
## Insights tab
The Insights tab provides curated, in-context monitoring views directly within project and experiment pages.
### Project Insights
When you open a project's Insights tab, you land on the built-in **Project Overview** view — a read-only dashboard covering key health metrics: trace volume, errors, latency, cost, feedback scores, and thread activity.
#### Custom views
Beyond the built-in view, you can create custom Insight views for your project:
1. Open the **views selector** dropdown in the Insights tab
2. Click **Add new** at the bottom
3. Enter a name for your view
Custom views are fully editable — you can add sections, configure widgets, and rearrange the layout. The current project is automatically set as the data source for all widgets.
**Views selector dropdown:**
* Search box at the top for filtering views
* Built-in "Project Overview" is always listed first with a "Built-in" tag
* Custom views appear below with their widget count and last modified date
* **Add new** button at the bottom
**View actions** (available on hover for custom views):
* **Edit name** — rename the view
* **Duplicate** — create a copy of the view
* **Delete** — remove the view (this action cannot be undone)
You can also **duplicate the built-in view** to create an editable copy as a custom view.
The Insights tab has its own time range selector, separate from the Logs tab. Each tab remembers its own time range across sessions.
### Experiment Insights
When comparing experiments, the Insights tab shows a single built-in read-only view displaying experiment comparison charts for the currently selected experiments. There is no view selector — only the built-in view is available.
## Widget types
Dashboards support several widget types. The available types depend on the dashboard type (Multi-project or Experiments).
### Time series widget (Multi-project)
Displays time-series charts for project metrics over time. Supports both line and bar chart visualizations.
**Available metrics:**
* **Trace feedback scores** - Quality metrics for traces over time
* **Number of traces** - Trace volume trends
* **Trace duration** - Trace performance trends
* **Token usage** - Token consumption over time
* **Estimated cost** - Spending trends
* **Failed guardrails** - Guardrail violations over time
* **Number of threads** - Thread volume trends
* **Thread duration** - Thread performance trends
* **Thread feedback scores** - Quality metrics for threads over time
**Configuration options:**
* **Project**: Select the project to pull data from
* **Metric type**: Choose from any of the metrics listed above
* **Chart type**: Line chart (best for trends) or Bar chart (good for volume/period comparisons)
* **Breakdown**: Optionally group data by a field to see per-group patterns. Available fields depend on the data source:
* Trace metrics: Tags, Name, Has error, Error type, Metadata key
* Span metrics: Tags, Name, Has error, Error type, Metadata key, Model, Provider, Span type
* Thread metrics: Tags
When a breakdown is active, use the **aggregation toggle** to control how data is bucketed: **Total** shows one value per group for the entire date range, while **Time-based** shows values in time buckets (hourly, daily, or weekly). Click a label in the chart legend to navigate directly to the traces list filtered to that group.
* **Filters**: Apply trace or thread filters to focus on specific data based on tags, metadata, or other attributes
* **Feedback scores**: When using feedback score metrics, optionally select specific scores to display (leave empty to show all)
### Single metric widget (Multi-project)
Shows a single metric value with a compact card display. Ideal for summary dashboards and key performance indicators.
**Data sources:** Traces or Spans
**Trace-specific metrics:**
* Total trace count
* Total thread count
* Average LLM span count
* Average span count
* Average estimated cost per trace
* Total guardrails failed count
**Span-specific metrics:**
* Total span count
* Average estimated cost per span
**Shared metrics (available for both traces and spans):**
* P50 duration - Median duration
* P90 duration - 90th percentile duration
* P99 duration - 99th percentile duration
* Total input count
* Total output count
* Total metadata count
* Average number of tags
* Total estimated cost sum
* Output tokens (avg.)
* Input tokens (avg.)
* Total tokens (avg.)
* Total error count
* Average feedback scores - Any feedback score defined in your project
### Metrics widget (Experiments)
Compares feedback scores across multiple experiments. Ideal for visualizing A/B test results and prompt iteration outcomes.
**Chart types:**
* **Line chart** - Show trends across experiments (default)
* **Bar chart** - View detailed score distributions side by side
* **Radar chart** - Compare multiple feedback scores across experiments in a radial view
**Configuration options:**
* **Filters**: Filter experiments by:
* Dataset — show only experiments from a specific dataset
* Configuration — filter by metadata keys and values (e.g., model="gpt-4")
* Experiment IDs — include specific experiments by ID
* **Groups** (collapsible, collapsed by default): Group aggregated results by:
* Dataset — compare results across different datasets
* Configuration — group by metadata keys to aggregate feedback scores (e.g., group by model type)
* Supports up to 5 grouping levels for hierarchical comparisons
* **Max experiments**: Limit the number of experiments displayed
* **Chart type**: Choose line, bar, or radar chart visualization
* **Metrics**: Optionally display only specific feedback scores (leave empty to show all)
### Leaderboard widget (Experiments)
Displays a table comparing experiments with configurable columns. Useful for ranking experiments by specific metrics and comparing results at a glance.
**Configuration options:**
* **Filters**: Same filtering options as the Metrics widget (dataset, configuration, experiment IDs)
* **Groups**: Same grouping options as the Metrics widget
* **Max experiments**: Limit the number of experiments displayed
* **Columns**: Select and reorder which columns to display. The columns menu shows all available columns with a "N of N selected" indicator and drag handles for reordering
* **Ranking**: Rank experiments by a specific metric. Options are "No ranking" (default) and any available feedback score metric. When "No ranking" is selected, the ranking order option is disabled
### Markdown text widget
Available for both Multi-project and Experiments dashboards. Add custom notes, descriptions, or documentation using markdown formatting. Use this widget to:
* Add section headers and explanations
* Document dashboard purpose and context
* Include links to related resources
* Add team notes or guidelines
## Creating a workspace dashboard
1. Navigate to the **Dashboards** page from the sidebar
2. Click **Create new dashboard**
3. Select the dashboard type: **Multi-project** or **Experiments**
4. Enter a name (description is optional)
5. Click **Create**
## Adding and configuring widgets
When you click the **+** button within a section, a unified widget configuration modal opens:
1. **Select a widget type** from the clickable cards at the top. The available types depend on the dashboard type:
* **Multi-project**: Time series, Single metric, Markdown
* **Experiments**: Metrics, Leaderboard, Markdown
2. Configure the widget settings below. The configuration area updates based on the selected widget type.
3. Each widget has its own **project or experiment selector** — there are no global dashboard defaults. For Insight views, the current project is automatically set.
4. For chart widgets, select the **visualization type** (line, bar, or radar) using clickable cards.
5. Click **Save** to add the widget.
## Customizing dashboards
### Adding sections
Dashboards are organized into sections, each containing one or more widgets:
1. Click **Add section** at the bottom of the dashboard
2. Give the section a title
3. Add widgets to the section
### Editing widgets
1. Click the menu icon on any widget
2. Select **Edit** to modify the widget configuration
3. Make your changes and save
### Rearranging widgets
* **Drag and drop**: Use the drag handle on widgets to reorder them within a section
* **Resize**: Drag the edges of widgets to adjust their size
### Collapsing sections
Click on a section title to collapse or expand it. The collapsed state is preserved across sessions.
## Date range filtering
Use the date picker in the toolbar to filter data by time range. Select a preset range (Last 24 hours, Last 7 days, etc.) or choose custom dates.
**Widgets that use date range filtering:**
* Time series widget - filters time-series data to the selected range
* Single metric widget - calculates statistics within the selected range
**Widgets not affected by date range:**
* Experiments metrics widget - displays experiment results regardless of date
* Leaderboard widget - displays experiment results regardless of date
* Markdown text widget - static content
## Saving changes
All dashboard changes are **saved automatically**. Built-in Insight views are read-only — duplicate them to create an editable copy.
## Sharing dashboards
To share your current dashboard view:
1. Click the **Share** button in the toolbar
2. The URL is copied to your clipboard
3. Share this URL with team members who have access to the workspace
The shared URL includes the dashboard ID, active date range, and any active filters, so recipients see the same view.
## Next steps
* Set up [Online Evaluation Rules](/production/online-evaluation/rules) to automatically generate feedback scores for your dashboards
* Configure [Alerts](/production/alerts/alerts) to get notified when metrics exceed thresholds
* Learn about [Production Monitoring](/tracing/dashboards/production_monitoring) best practices
***
headline: Production monitoring | Opik Documentation
og:description: Monitor your production LLM applications with Opik. Evaluate feedback scores, trace counts, and token usage daily and hourly for optimal insights.
og:site_name: Opik Documentation
og:title: Production Monitoring with Opik: Optimize Performance
subtitle: Describes how to monitor your LLM applications in production using Opik
title: Production monitoring
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
Opik has been designed from the ground up to support high volumes of traces making it the ideal tool for monitoring your production LLM applications.
You can use the **Insights** tab within any project to review your feedback scores, trace count, latency, and cost over time. The built-in Project Overview provides an at-a-glance health check with stats cards and time-series charts. For more details, see [Dashboards](/tracing/dashboards/dashboards).
In addition to viewing scores over time, you can also view the average feedback scores for all the traces in your project from the traces table.
## Logging feedback scores
To monitor the performance of your LLM application, you can log feedback scores using the [Python SDK and through the UI](/tracing/advanced/annotate_traces).
### Defining online evaluation metrics
You can define LLM as a Judge metrics in the Opik platform that will automatically score all, or a subset, of your production traces. You can find more information about how to define LLM as a Judge metrics in the [Online evaluation](/production/online-evaluation/rules) section.
Once a rule is defined, Opik will score all the traces in the project and allow you to track these feedback scores over time.
Your agent pulls the active configuration at runtime, so you can update prompts and model settings without redeploying code. Guaranteed availability ensures your agent always has a valid configuration, even during updates.
## Test in the Agent Sandbox
Connect your local agent to Opik with a single command:
```bash
opik endpoint --project
Switch to the **Configuration** tab to tweak prompts and parameters without changing code. The sandbox runs your agent against the unsaved configuration so you can test before committing changes.
## Deploy new versions
When you're happy with a configuration, use the **Deploy to** button in the Agent Configuration UI to push it live. Your agent picks up the new version automatically — no code changes, no redeployment.
## More tools
Every run in the sandbox produces a full trace — every LLM call, tool invocation, and sub-step is
captured as spans with inputs, outputs, latencies, and token costs. Logs from your running agent
stream to the UI in real time, so you can watch execution as it happens.
The sandbox monitors your runner with a heartbeat and updates its status automatically if it
disconnects. When `--watch` is enabled, file changes are detected and your agents are
re-registered without restarting the process. For CI or programmatic setups, use `--headless` to
skip the browser pairing flow entirely.
## Troubleshooting
**"No entrypoint found" error**
Make sure at least one function is decorated with `@opik.track(entrypoint=True)` in Python or
`track({ entrypoint: true }, fn)` in TypeScript. The entrypoint must be discoverable from the
current working directory.
**Pairing times out**
The browser pairing session expires after 5 minutes. Re-run the command to generate a new session.
Make sure your Opik environment variables are set correctly — see
[Getting started with Observability](/tracing/getting-started) for configuration details.
**Runner disconnects**
Opik uses heartbeat monitoring to detect disconnects. If your runner shows as disconnected in the
UI, check that the process is still running locally and that your network connection is stable.
## FAQ
## Compare prompt variants side by side
Each variant in the Playground is independent — it has its own model, messages, and configuration.
This means you can test a prompt change against the current version, try different models on the
same prompt, or experiment with temperature and sampling parameters, all in a single view.
Supported providers include OpenAI, Anthropic, Gemini, OpenRouter, Vertex AI, and custom
endpoints. Reasoning models like Claude and o1/o3 expose additional controls such as thinking
effort.
Click **Run** (or press **Shift+Enter**) to execute all variants at once. Results stream in real
time with the model's response, token usage, latency, and a link to the full trace.
## Validate against test suites
The real power of the Playground is running your prompt variants against a dataset or test suite.
Instead of manually checking a handful of inputs, you can validate across your full set of test
cases and see which variant performs better.
## Adding Agent Configuration to your code
## Agent Configurations
An Agent Configuration is a versioned bundle of everything that defines how your agent behaves:
* **Prompts** — System prompts, user prompt templates, and multi-turn chat templates
* **Model parameters** — Model name, temperature, top-p, and other LLM settings
* **Tool definitions** — Descriptions and schemas for tools your agent can call
* **Custom parameters** — Any other string or numeric values your agent needs (e.g., RAG retrieval thresholds)
Storing these together gives you a single source of truth for how your agent was configured at any point in time.
## Versions
Every change creates a new immutable version (`v1`, `v2`, `v3`, etc.). Once created, a version can't be modified — so you always have a full audit trail and can roll back if needed.
You can create new versions through the SDK or the Opik UI. See the [Getting started](/prompt_engineering/getting-started) guide for details.
## Environments
Environments are labels you attach to a specific version to control which configuration your agent fetches at runtime. You manage these through the Opik UI — no code changes needed.
* **`prod`** — The production version (this is what the SDK fetches by default)
* **Custom labels** — Any label you create in the UI, like `staging` or `canary`
To update your agent in production, just move the `prod` label to a new version in the UI. Your agent picks it up on the next fetch.
### Fetching by version
Instead of using environment labels, you can also fetch a specific version directly using the `version` parameter:
* **`latest`**: The most recently created version
* **`v3`** (or any version name): A specific pinned version
***
headline: Guaranteed Availability | Opik Documentation
og:description: Learn how Opik's caching and fallback system ensures your agent always has a configuration to work with, even when the platform is unreachable
og:site_name: Opik Documentation
og:title: Guaranteed Availability for Agent Configuration
title: Guaranteed availability
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
Your agent shouldn't break because of a single failed network call. The SDK has two layers of protection built in — an in-memory cache and a hardcoded fallback — so your agent always has a configuration to work with.
## Caching layer
The SDK caches the last successfully fetched configuration in memory with a default TTL of **300 seconds (5 minutes)**. A background thread refreshes stale entries before they expire, so your agent never blocks on a network call during normal operation.
If Opik is unreachable during a refresh, the SDK keeps serving the cached value — your agent keeps running.
You can tune the cache TTL by setting the `OPIK_CONFIG_TTL_SECONDS` environment variable:
```bash
OPIK_CONFIG_TTL_SECONDS=60 # refresh every minute
```
## Fallback configuration
On the very first call (cold start), or if the cache is empty and Opik is unreachable, you can provide a hardcoded fallback. The SDK returns it instead of raising an error, so your agent can start serving requests right away — even before it has contacted the backend.
## Why teams choose Opik Agent Optimizer
* **Automatic prompt optimization** – end-to-end workflow that installs in minutes and runs locally or in your stack.
* **Open-source and framework agnostic** – no lock-in, use Opik’s first-party optimizers or community favorites like GEPA in the same SDK.
* **Agent-aware** – optimize beyond system prompts, including MCP tool signatures, function-calling schemas, and multi-step agent workflows.
* **Deep observability** – every trial logs prompts, tool calls, traces, and metric reasons to Opik so you can explain and ship changes confidently.
## Key capabilities
## Configure the run
### Name the run
Give the run a descriptive name so you can find it later. A good pattern is `goal + dataset + date`, for example “Support intent v1 - Jan 2026”.
### Configure the prompt
Choose the model that will generate responses, then set the message roles (System, User, and so on). If your dataset has fields like `question` or `answer`, insert them with `{{variable}}` placeholders so each example flows into the prompt correctly. Start with the prompt you already use in production so improvements are easy to compare.
### Pick an algorithm
Choose how Opik should search for better prompts. GEPA works well for single-turn prompts and quick improvements, while HRPO is better when you need deeper analysis of why a prompt fails. If you are new, start with GEPA to get a quick baseline, then switch to HRPO if you need deeper insight. For technical details, see [Optimization algorithms](/development/optimization-runs/algorithms/overview).
### Choose a dataset
Pick an existing dataset to supply examples. Aim for diverse, real-world cases rather than edge cases only, and keep the first run small so you can iterate quickly. If you need to create or upload data first, see [Manage datasets](/evaluation/advanced/manage_datasets).
### Define a metric
Pick how Opik should score each prompt. Use Equals if the output should match exactly, or G-Eval if you want a model to grade quality. When using G-Eval, make sure the grading prompt reflects what “good” means for your task.
* **Equals**: Use when you have a single correct answer and want a strict match.
* **G-Eval**: Use when answers can vary and you want a model to score quality.
## Monitor progress
Once the run starts, Optimization Studio shows the best score so far and a progress chart for each trial.
## Analyze results
The Trials tab is where you compare prompt variations and scores, by clicking on a specific trial you can view the individual trial items that were evaluated.
## Actions
You can rerun the same setup, cancel a run to change inputs, or select multiple runs to compare outcomes.
## Reuse results outside the UI
If you want to automate optimizations in code later, follow [Optimize prompts](/development/optimization-runs/optimization/optimize_prompts) and use the same dataset and metric from this run.
## Next steps
For a deeper breakdown of trials and traces, visit [Dashboard results](/development/optimization-runs/optimization/dashboard_results). If you want to automate this workflow, use [Optimize prompts](/development/optimization-runs/optimization/optimize_prompts). To fine-tune your strategy, explore [Optimization algorithms](/development/optimization-runs/algorithms/overview).
***
description: Install the Agent Optimizer SDK, run your first optimization, and inspect the results in under 10 minutes.
headline: Quickstart | Opik Documentation
og:description: Learn to enhance your workflows with Opik Agent Optimizer for automated prompt and agent improvements in your optimization runs.
og:site_name: Opik Documentation
og:title: Optimize Prompts with Opik Agent Optimizer
title: Quickstart
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
## Quickstart
You can use the `FewShotBayesianOptimizer` to optimize a prompt by following these steps:
```python maxLines=1000
from opik_optimizer import FewShotBayesianOptimizer
from opik.evaluation.metrics import LevenshteinRatio
from opik_optimizer import datasets, ChatPrompt
# Initialize optimizer
optimizer = FewShotBayesianOptimizer(
model="openai/gpt-4",
model_parameters={
"temperature": 0.1,
"max_tokens": 5000
},
)
# Prepare dataset
dataset = datasets.hotpot(count=300)
# Define metric and prompt (see docs for more options)
def levenshtein_ratio(dataset_item, llm_output):
return LevenshteinRatio().score(reference=dataset_item["answer"], output=llm_output)
prompt = ChatPrompt(
messages=[
{"role": "system", "content": "Provide an answer to the question."},
{"role": "user", "content": "{question}"}
]
)
# Run optimization
results = optimizer.optimize_prompt(
prompt=prompt,
dataset=dataset,
metric=levenshtein_ratio,
n_samples=100
)
# Access results
results.display()
```
## Configuration Options
### Optimizer parameters
The optimizer has the following parameters:
## Quickstart
```python
"""
Optimize a simple system prompt on the tiny_test dataset.
Requires: pip install gepa, and a valid OPENAI_API_KEY for LiteLLM-backed models.
"""
from typing import Any, Dict
from opik.evaluation.metrics import LevenshteinRatio
from opik.evaluation.metrics.score_result import ScoreResult
from opik_optimizer import ChatPrompt, datasets
from opik_optimizer.gepa_optimizer import GepaOptimizer
def levenshtein_ratio(dataset_item: Dict[str, Any], llm_output: str) -> ScoreResult:
return LevenshteinRatio().score(reference=dataset_item["label"], output=llm_output)
dataset = datasets.tiny_test()
prompt = ChatPrompt(
system="You are a helpful assistant. Answer concisely with the exact answer.",
user="{text}",
)
optimizer = GepaOptimizer(
model="openai/gpt-4o-mini",
n_threads=6,
temperature=0.2,
max_tokens=200,
)
result = optimizer.optimize_prompt(
prompt=prompt,
dataset=dataset,
metric=levenshtein_ratio,
max_trials=12,
reflection_minibatch_size=2,
n_samples=5,
)
result.display()
```
### Determinism and tool usage
* GEPA’s seed is forwarded directly to the underlying `gepa.optimize` call, but any non-determinism in your prompt (tool calls, non-zero temperature, external APIs) will still introduce variance. To test seeding in isolation, disable tools or substitute cached responses.
* GEPA emits its own baseline evaluation inside the optimization loop. You’ll see one baseline score from Opik’s wrapper and another from GEPA before the first trial; this is expected and does not double-charge the metric budget.
* Reflection only triggers after GEPA accepts at least `reflection_minibatch_size` unique prompts. If the minibatch is larger than the trial budget, the optimizer logs a warning and skips reflection.
* GEPA supports **tool use during evaluation** (`allow_tool_use=True`) but does **not** support `optimize_tools=True` yet. Tool-description optimization requests are currently degraded/blocked until the adapter supports it.
### GEPA scores vs. Opik scores
* The **GEPA Score** column reflects the aggregate score GEPA computes on its train/validation split when deciding which candidates stay on the Pareto front. It is useful for understanding how GEPA’s evolutionary search ranks prompts.
* The **Opik Score** column is a fresh evaluation performed through Opik’s metric pipeline on the same dataset (respecting `n_samples`). This is the score you should use when comparing against your baseline or other optimizers.
* Because the GEPA score is based on GEPA’s internal aggregation, it can diverge from the Opik score for the same prompt. This is expected—treat the GEPA score as a hint about why GEPA kept or discarded a candidate, and rely on the Opik score for final comparisons.
### `skip_perfect_score`
* When `skip_perfect_score=True`, GEPA immediately ignores any candidate whose GEPA score meets or exceeds the `perfect_score` threshold (default `1.0`). This keeps the search moving toward imperfect prompts instead of spending budget refining already perfect ones.
* Set `skip_perfect_score=False` if your metric tops out below `1.0`, or if you still want to see how GEPA mutates a perfect-scoring prompt—for example, when you care about ties being broken by Opik’s rescoring step rather than GEPA’s aggregate.
## Configuration Options
### Optimizer parameters
The optimizer has the following parameters:
## Two approaches to evaluation
Opik provides two complementary approaches to evaluation:
* **Test Suites**: Define natural-language assertions and let an LLM judge check them automatically. Best for pass/fail testing of specific behaviors.
* **Datasets & Metrics**: Score your agent's outputs against a dataset using pre-built or custom metrics. Best for measuring quality across many traces with quantitative scores.
## Key features
* **Test Suites** with natural-language assertions and execution policies
* **30+ pre-built metrics** for hallucination, relevance, coherence, and more
* **Custom metrics** for domain-specific evaluation
* **Experiment tracking** to compare versions side-by-side
* **Annotation Queues** for human-in-the-loop review
## Next steps
* [Getting started](/evaluation/getting-started) — Run your first evaluation in minutes
* [Concepts](/evaluation/concepts) — Understand Test Suites vs Datasets & Metrics
* [Building Test Suites](/evaluation/advanced/building-test-suites) — Create and manage suites via the SDK, UI, or Ollie
* [Debugging agents with Ollie](/tracing/debug-agents) — The full workflow for turning production failures into test cases
***
headline: Getting Started with Evaluation | Opik Documentation
og:description: Get started evaluating your LLM application with Opik using Test Suites or dataset-driven metrics
og:site_name: Opik Documentation
og:title: Getting Started with Evaluation — Opik
title: Getting started with Evaluation
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
Opik provides two approaches to evaluation. Choose the one that fits your use case:
* **Test Suites**: Define assertions in natural language and let an LLM judge test them. Best for pass/fail behavioral testing.
* **Datasets & Metrics**: Score outputs against a dataset using quantitative metrics. Best for measuring quality across many traces.
## Quick start
See the [Building Test Suites](/evaluation/advanced/building-test-suites) guide for the full walkthrough.
## With the SDK
### Create a suite
Define the quality bars you care about as suite-level assertions:
## Advanced usage
### Missing arguments for scoring methods
When you face the `opik.exceptions.ScoreMethodMissingArguments` exception, it means that the dataset
item and task output dictionaries do not contain all the arguments expected by the scoring method.
The way the evaluate function works is by merging the dataset item and task output dictionaries and
then passing the result to the scoring method. For example, if the dataset item contains the keys
`user_question` and `context` while the evaluation task returns a dictionary with the key `output`,
the scoring method will be called as `scoring_method.score(user_question='...', context= '...', output= '...')`.
This can be an issue if the scoring method expects a different set of arguments.
You can solve this by either updating the dataset item or evaluation task to return the missing
arguments or by using the `scoring_key_mapping` parameter of the `evaluate` function. In the example
above, if the scoring method expects `input` as an argument, you can map the `user_question` key to
the `input` key as follows:
### Logging traces to a specific project
You can use the `project_name` parameter of the `evaluate` function to log evaluation traces to a specific project:
If you need to create a dataset with more than 1,000 rows, you [can use the SDK](/evaluation/advanced/manage_datasets#creating-a-dataset-using-the-sdk).
### Saving or discarding changes
To commit your draft as a new version:
1. Click **Save changes** in the toolbar
2. Enter a **version note** describing what changed
3. Optionally add **tags** to categorize this version
4. Click **Save**
To abandon your draft, click **Discard changes** and confirm. If you try to navigate away with unsaved changes, Opik displays a warning to prevent accidental loss of work.
From this view you can:
* **View items**: Click a version row and select **View items** to see the exact data at that point in time
* **Restore**: Click the **⋮** menu and select **Restore this version** to create a new version with that data
* **Edit metadata**: Click the **⋮** menu and select **Edit** to update the version note or tags (the data itself remains immutable)
#### Inserting items from a JSONL file
You can also insert items from a JSONL file:
### Configuration options
**Sample Count**: Start with a smaller number (10-20) to review the quality before generating larger batches.
**Preserve Fields**: Use this to maintain consistency in certain fields while allowing variation in others. For example, preserve the `category` field while varying the `input` and `expected_output`.
**Variation Instructions**: Provide specific guidance such as:
* "Create variations with different difficulty levels"
* "Generate edge cases and error scenarios"
* "Add examples with different input formats"
* "Include multilingual variations"
### Best practices
* **Start small**: Generate 10-20 samples first to evaluate quality before scaling up
* **Review generated content**: Always review AI-generated samples for accuracy and relevance
* **Use variation instructions**: Provide clear guidance on the type of variations you want
* **Preserve key fields**: Use field preservation to maintain important categorizations or metadata
* **Iterate and refine**: Use the custom prompt option to fine-tune generation for your specific needs
This works with filtered views too—if you have a filter applied, "Select all" only selects items matching that filter.
### Available bulk operations
Once you have items selected, the toolbar shows available operations:
* **Add tags**: Apply one or more tags to all selected items
* **Delete**: Remove selected items (creates a new version with items removed)
* **Export**: Download selected items as CSV or JSON
### Processing indicators
For large bulk operations:
* A loading indicator shows "Your dataset is still processing..."
* The operation runs in the background—you can continue browsing
* A success message appears when processing completes
## Prerequisites
Before evaluating agent trajectories, you need:
1. **Opik SDK installed and configured** — See [Quickstart](/quickstart) for setup
2. **Agent with observability enabled** — Your agent must be instrumented with Opik tracing
3. **Test dataset** — Examples with expected agent behavior
If your agent isn't traced yet, see [Log Traces](/tracing/advanced/log_traces) to add observability first.
### Installing the Opik SDK
To install the Opik Python SDK you can run the following command:
```bash
pip install opik
```
Then you can configure the SDK by running the following command:
```bash
opik configure
```
This will prompt you for your API key and workspace or your instance URL if you are self-hosting.
### Adding observability to your agent
In order to be able to evaluate the agent's trajectory, you need to add tracing to your agent. This
will allow us to capture the agent's trajectory and evaluate it.
## Creating the user simulator
In order to perform multi-turn evaluation, we need to create a user simulator that will generate
the user's response based on previous turns
```python title="User simulator" maxLines=1000
from opik.simulation import SimulatedUser
user_simulator = SimulatedUser(
persona="You are a frustrated user who wants a refund",
model="openai/gpt-4.1",
)
conversation_history = [
{"role": "assistant", "content": "Hello, how can I help you today?"}
]
for turn in range(3):
# Generate a user message based on the conversation so far
user_message = user_simulator.generate_response(conversation_history)
conversation_history.append({"role": "user", "content": user_message})
print(f"User: {user_message}")
# In practice, this would be your agent's response
agent_response = f"Placeholder agent response for turn {turn + 1}"
conversation_history.append({"role": "assistant", "content": agent_response})
print(f"Assistant: {agent_response}\n")
```
Now that we have a way to simulate the user, we can create multiple simulations that we will in
turn evaluate.
## Running simulations
## Next steps
* Learn more about [conversation metrics](/evaluation/metrics/conversation_threads_metrics)
* Learn more about [custom conversation metrics](/evaluation/metrics/custom_conversation_metric)
* Learn more about [evaluate\_threads](/evaluation/evaluate_threads)
* Learn more about [agent trajectory evaluation](/evaluation/advanced/evaluate_agent_trajectory)
***
headline: Annotation Queues | Opik Documentation
og:description: Optimize your AI projects by enabling SMEs to efficiently review and annotate outputs using Opik's intuitive Annotation Queues feature.
og:site_name: Opik Documentation
og:title: Streamline Annotation Queues with Opik
subtitle: Enable subject matter experts to review and annotate agent outputs with easy queues, invitations, and a clean annotation UI
title: Annotation Queues
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
Involving subject matter experts in AI projects is essential because they provide the domain knowledge and contextual judgment that ensures model outputs are accurate, relevant, and aligned with real-world expectations. Annotation Queues in Opik make it simple for subject matter experts (SMEs) to review and annotate agent outputs. This feature streamlines the human-in-the-loop process by providing easy queue management, simple invitation flows, and a distraction-free annotation experience designed for non-technical users.

Annotation Queues are collections of traces or threads that need human review and feedback. They enable you to organize content for review, share with SMEs easily, collect structured feedback, and track progress across all your evaluation workflows.
## Creating and Managing Annotation Queues
Each annotation queue is defined by a collection of traces or threads, evaluation instructions, and feedback definitions:
1. **Queue Configuration**: Set up the queue with clear instructions and scope
2. **Content Selection**: Add traces or threads that need human review
3. **SME Access**: Share queue links with subject matter experts for annotation
### Setting Up Your First Queue
Navigate to the **Annotation Queues** page in your project and click **Create Queue**.
Configure your queue with:
* **Name**: Clear identification for your queue
* **Scope**: Choose between traces or threads
* **Instructions**: Provide context and guidance for reviewers
* **Feedback Definitions**: Select the metrics SMEs will use for scoring
### Adding Content to Your Queue
You can add items to your queue in several ways:
**From Traces/Threads Lists:**
* Select one or multiple items
* Click **Add to -> Add to annotation queue**
* Choose an existing queue or create a new one
**From Individual Trace/Thread Details:**
* Open the trace or thread detail view
* Click **Add to -> Add to annotation queue** in the actions panel
* Select your target queue
### Sharing with Subject Matter Experts
Once your queue is set up, you can share it with SMEs:
**Copy Queue Link:**
The annotation workflow begins with clear instructions and context, allowing SMEs to understand what they're evaluating and how to provide meaningful feedback.
The SME interface provides:
1. **Clean, focused design**: No technical jargon or complex navigation
2. **Clear instructions**: Queue-specific guidance displayed prominently
3. **Structured feedback**: Predefined metrics with clear descriptions
4. **Progress tracking**: Visual indicators of completion status
5. **Comment system**: Optional text feedback for additional context
### Annotation Workflow
1. **Access the queue**: SME clicks the shared link
2. **Review content**: Examine the trace or thread output
3. **Provide feedback**: Score using predefined metrics
4. **Add comments**: Optional text feedback
5. **Submit and continue**: Move to the next item
## Managing Queues programmatically
You can create and manage annotation queues programmatically using the Python or TypeScript SDK. This is useful for automating the process of adding items to queues based on specific criteria.
### Creating an Annotation Queue
For each interaction with the end user, you can also know how the agent planned, chose tools, or crafted an answer based on the user input, the agent graph and much more.
During development phase, having access to all this information is fundamental for debugging and understanding what is working as expected and what’s not.
**Error detection**
Having immediate access to all traces that returned an error can also be life-saving, and Opik makes it extremely easy to achieve:
For each of the errors and exceptions captured, you have access to all the details you need to fix the issue:
### 2. Evaluate Agent's End-to-end Behavior
Once you have full visibility on the agent interactions, memory and tool usage, and you made sure everything is working at the technical level, the next logical step is to start checking the quality of the responses and the actions your agent takes.
**Human Feedback**
The fastest and easiest way to do it is providing manual human feedback. Each trace and each span can be rated “Correct” or “Incorrect” by a person (most probably you!) and that will give a baseline to understand the quality of the responses.
You can provide human feedback and a comment for each trace’s score in Opik and when you’re done you can store all results in a dataset that you will be using in next iterations of agent optimization.
**Online evaluation**
Marking an answer as simply “correct” or “incorrect” is a useful first step, but it’s rarely enough. As your agent grows more complex, you’ll want to measure how well it performs across more nuanced dimensions.
That’s where online evaluation becomes essential.
With Opik, you can automatically score traces using a wide range of metrics, such as answer relevance, hallucination detection, agent moderation, user moderation, or even custom criteria tailored to your specific use case. These evaluations run continuously, giving you structured feedback on your agent’s quality without requiring manual review.
#### What Happens Next? Iterate, Improve, and Compare
Running the experiment once gives you a **baseline**: a first measurement of how good (or bad) your agent's tool selection behavior is.
But the real power comes from **using these results to improve your agent** — and then **re-running the experiment** to measure progress.
Here’s how you can use this workflow:
When creating a new rule, you will be presented with the following options:
1. **Name:** The name of the rule
2. **Sampling rate:** The percentage of traces to score. When set to `100%`, all traces will be scored.
3. **Model:** The model to use to run the LLM as a Judge metric. For evaluating traces with images, make sure to select a model that supports vision capabilities.
4. **Prompt:** The LLM as a Judge prompt to use. Opik provides a set of base prompts (Hallucination, Moderation, Answer Relevance) that you can use or you can define your own. Variables in the prompt should be in `{{variable_name}}` format.
5. **Variable mapping:** This is the mapping of the variables in the prompt to the values from the trace.
6. **Score definition:** This is the format of the output of the LLM as a Judge metric. By adding more than one score, you can define LLM as a Judge metrics that score an LLM output along different dimensions.
### Opik's built-in LLM as a Judge metrics
Opik comes pre-configured with 3 different LLM as a Judge metrics:
1. Hallucination: This metric checks if the LLM output contains any hallucinated information.
2. Moderation: This metric checks if the LLM output contains any offensive content.
3. Answer Relevance: This metric checks if the LLM output is relevant to the given context.
When writing your own LLM as a Judge metric you will need to specify the prompt variables using the mustache syntax, ie.
`{{ variable_name }}`. You can then map these variables to your trace data using the `variable_mapping` parameter. When the
rule is executed, Opik will replace the variables with the values from the trace data.
You can control the format of the output using the `Scoring definition` parameter. This is where you can define the scores you want the LLM as a Judge metric to return. Under the hood, we will use this definition in conjunction with the [structured outputs](https://platform.openai.com/docs/guides/structured-outputs) functionality to ensure that the LLM as a Judge metric always returns trace scores.
### Evaluating traces with images
LLM as a Judge metrics can evaluate traces that contain images when using vision-capable models. This is useful for:
* Evaluating image generation quality
* Analyzing visual content in multimodal applications
* Validating image-based responses
To reference image data from traces in your evaluation prompts:
1. In the prompt editor, click the **"Images +"** button to add an image variable
2. Map the image variable to the trace field containing image data using the Variable Mapping section
We have built-in templates for the LLM as a Judge metrics that you can use to score the entire conversation:
1. **Conversation Coherence:** This metric checks if the conversation is coherent and follows a logical flow, return a decimal score between 0 and 1.
2. **User Frustration:** This metric checks if the user is frustrated with the conversation, return a decimal score between 0 and 1.
3. **Custom LLM as a Judge metrics:** You can use this template to score the entire conversation using your own LLM as a Judge metric. By default, this template uses binary scoring (true/false) following best practices.
For the LLM as a Judge metrics, keep in mind the only variable available is the `{{context}}` one, which is a dictionary containing the entire conversation:
```json
[
{
"role": "user",
"content": "Hello, how are you?"
},
{
"role": "assistant",
"content": "I'm good, thank you!"
}
]
```
Similarly, for the Python metrics, you have the `Conversation` object available to you. This object is a `List[Dict]` where each dict represents a message in the conversation.
```python
[
{
"role": "user",
"content": "Hello, how are you?"
},
{
"role": "assistant",
"content": "I'm good, thank you!"
}
]
```
For online scoring rules on threads, Opik waits for a "cooldown period" after the last activity in a thread
before running the evaluation. This ensures the scoring is done on the full context of the conversation.
# How it works
Conceptually, we need to determine the presence of a series of risks for each input and
output, and take action on it.
The ideal method depends on the type of the problem,
and aims to pick the best combination of accuracy, latency and cost.
There are three commonly used methods:
1. **Heuristics or traditional NLP models**: ideal for checking for PII or competitor mentions
2. **Small language models**: ideal for staying on topic
3. **Large language models**: ideal for detecting complex issues like hallucination
# Types of guardrails
Providers like OpenAI or Anthropic have built-in guardrails for risks like harmful or
malicious content and are generally desirable for the vast majority of users.
The Opik Guardrails aim to cover the residual risks which are often very user specific, and need to be configured with more detail.
## PII guardrail
The PII guardrail checks for sensitive information, such as name, age, address, email, phone number, or credit card details.
The specific entities can be configured in the SDK call, see more in the reference documentation.
*The method used here leverages traditional NLP models for tokenization and named entity recognition.*
## Topic guardrail
The topic guardrail ensures that the inputs and outputs remain on topic.
You can configure the allowed or disallowed topics in the SDK call, see more in the reference documentation.
*This guardrails relies on a small language model, specifically a zero-shot classifier.*
## Custom guardrail
Custom guardrail allows you to define your own guardrails using a custom model, custom library or custom business logic and log the response to Opik. Below is a basic example that filters out competitor brands:
```python
import opik
import opik.opik_context
import traceback
# Brand mention detection
competitor_brands = [
"OpenAI",
"Anthropic",
"Google AI",
"Microsoft Copilot",
"Amazon Bedrock",
"Hugging Face",
"Mistral AI",
"Meta AI",
]
opik_client = opik.Opik()
def custom_guardrails(generation: str, trace_id: str) -> str:
# Start the guardrail span first so the duration is accurately captured
guardrail_span = opik_client.span(name="Guardrail", input={"generation": generation}, type="guardrail", trace_id=trace_id)
# Custom guardrail logic - detect competitor brand mentions
found_brands = []
for brand in competitor_brands:
if brand.lower() in generation.lower():
found_brands.append(brand)
# The key `guardrail_result` is required by Opik guardrails and must be either "passed" or "failed"
if found_brands:
guardrail_result = "failed"
output = {"guardrail_result": guardrail_result, "found_brands": found_brands}
else:
guardrail_result = "passed"
output = {"guardrail_result": guardrail_result}
# Log the spans
guardrail_span.end(output=output)
# Upload the guardrail data for project-level metrics
guardrail_data = {
"project_name": opik_client._project_name,
"entity_id": trace_id,
"secondary_id": guardrail_span.id,
"name": "TOPIC", # Supports either "TOPIC" or "PII"
"result": guardrail_result,
"config": {"blocked_brands": competitor_brands},
"details": output,
}
try:
opik_client.rest_client.guardrails.create_guardrails(guardrails=[guardrail_data])
except Exception as e:
traceback.print_exc()
return generation
@opik.track
def main():
good_generation = "You should use our AI platform for your machine learning projects!"
custom_guardrails(good_generation, opik.opik_context.get_current_trace_data().id)
bad_generation = "You might want to try OpenAI or Google AI for your project instead."
custom_guardrails(bad_generation, opik.opik_context.get_current_trace_data().id)
if __name__ == "__main__":
main()
```
After running the custom guardrail example above, you can view the results in the Opik dashboard. The guardrail spans will appear alongside your traces, showing which brand mentions were detected and whether the guardrail passed or failed.
# Getting started
## Running the guardrail backend
You can start the guardrails backend by running:
```bash
./opik.sh --guardrails
```
## Using the Python SDK
```python
from opik.guardrails import Guardrail, PII, Topic
from opik import exceptions
guardrail = Guardrail(
guards=[
Topic(restricted_topics=["finance", "health"], threshold=0.9),
PII(blocked_entities=["CREDIT_CARD", "PERSON"]),
]
)
llm_response = "You should buy some NVIDIA stocks!"
try:
guardrail.validate(llm_response)
except exceptions.GuardrailValidationFailed as e:
print(e)
```
The immediate result of a guardrail failure is an exception, and your application code will need to handle it.
The call is blocking, since the main purpose of the guardrail is to prevent the application from proceeding with a potentially undesirable response.
### Guarding streaming responses and long inputs
You can call `guardrail.validate` repeatedly to validate the response chunk by chunk, or their parts or combinations.
The results will be added as additional spans to the same trace.
```python
for chunk in response:
try:
guardrail.validate(chunk)
except exceptions.GuardrailValidationFailed as e:
print(e)
```
## Working with the results
### Examining specific traces
When a guardrail fails on an LLM call, Opik automatically adds the information to the trace.
You can filter the traces in your project to only view those that have failed the guardrails.
### Analyzing trends
You can also view how often each guardrail is failing in the Metrics section of the project.
## Performance and limit considerations
The guardrails backend will use a GPU automatically if there is one available.
For production use, running the guardrails backend on a GPU node is strongly recommended.
Current limits:
* Topic guardrail: the maximum input size is 1024 tokens
* Both Topic and PII guardrails support English language
***
headline: Anonymizers | Opik Documentation
og:description: Protect sensitive information in your LLM applications with Opik's Anonymizers, ensuring compliance and preventing accidental data exposure.
og:site_name: Opik Documentation
og:title: Anonymizers - Opik for Secure LLM Applications
title: Anonymizers
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
# How it works
Anonymizers work by processing all data that flows through Opik's tracing system - including inputs, outputs, and metadata - before it's stored or displayed. They apply a set of rules to detect and replace sensitive information with anonymized placeholders.
The anonymization happens automatically and transparently:
1. **Data Ingestion**: When you log traces and spans to Opik
2. **Rule Application**: Registered anonymizers scan the data using their configured rules
3. **Replacement**: Sensitive information is replaced with anonymized placeholders
4. **Storage**: Only the anonymized data is stored in Opik
# Types of Anonymizers
## Rules-based Anonymizer
The most common type of anonymizer uses pattern-matching rules to identify and replace sensitive information. Rules can be defined in several formats:
### Regex Rules
Use regular expressions to match specific patterns:
```python
import opik
from opik.anonymizer import create_anonymizer
# Dictionary format
email_rule = {"regex": r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b", "replace": "[EMAIL]"}
# Tuple format
phone_rule = (r"\b\d{3}-\d{3}-\d{4}\b", "[PHONE]")
# Create anonymizer with multiple rules
anonymizer = create_anonymizer([email_rule, phone_rule])
# Register globally
opik.hooks.add_anonymizer(anonymizer)
```
### Function Rules
Use custom Python functions for more complex anonymization logic:
```python
import opik
from opik.anonymizer import create_anonymizer
def mask_api_keys(text: str) -> str:
"""Custom function to anonymize API keys"""
import re
# Match common API key patterns
api_key_pattern = r'\b(sk-[a-zA-Z0-9]{32,}|pk_[a-zA-Z0-9]{24,})\b'
return re.sub(api_key_pattern, '[API_KEY]', text)
def anonymize_with_hash(text: str) -> str:
"""Replace emails with consistent hashes for tracking without exposing PII"""
import re
import hashlib
def hash_replace(match):
email = match.group(0)
hash_val = hashlib.md5(email.encode()).hexdigest()[:8]
return f"[EMAIL_{hash_val}]"
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
return re.sub(email_pattern, hash_replace, text)
# Create anonymizer with function rules
anonymizer = create_anonymizer([mask_api_keys, anonymize_with_hash])
opik.hooks.add_anonymizer(anonymizer)
```
### Mixed Rules
Combine different rule types for comprehensive anonymization:
```python
import opik
import opik.hooks
from opik.anonymizer import create_anonymizer
# Mix of dictionary, tuple, and function rules
mixed_rules = [
{"regex": r"\b\d{3}-\d{2}-\d{4}\b", "replace": "[SSN]"}, # Social Security Numbers
(r"\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b", "[CARD]"), # Credit Cards
lambda text: text.replace("CONFIDENTIAL", "[REDACTED]"), # Custom replacements
]
anonymizer = create_anonymizer(mixed_rules)
opik.hooks.add_anonymizer(anonymizer)
```
## Custom Anonymizers
For advanced use cases, create custom anonymizers by extending the `Anonymizer` base class.
### Understanding Anonymizer Arguments
When implementing custom anonymizers, you need to implement the `anonymize()` method with the following signature:
```python
def anonymize(self, data, **kwargs):
# Your anonymization logic here
return anonymized_data
```
**The `kwargs` parameters:**
The `anonymize()` method also receives additional context through `**kwargs`:
* **`field_name`**: Indicates which field is being anonymized (`"input"`, `"output"`, `"metadata"`, or nested field names in dots notation such as `"metadata.email"`)
* **`object_type`**: The type of the object being processed (`"span"`, `"trace"`)
**When are kwargs available?**
These kwargs are automatically passed by Opik's internal data processors when anonymizing trace and span data before sending it to the backend. This allows you to apply different anonymization strategies based on the field being processed.
**Example: Field-specific anonymization**
```python
from opik.anonymizer import Anonymizer
import opik.hooks
class FieldAwareAnonymizer(Anonymizer):
def anonymize(self, data, **kwargs):
field_name = kwargs.get("field_name", "")
# Only anonymize the output field, leave input as-is for debugging
if field_name == "output" and isinstance(data, str):
import re
# More aggressive anonymization for outputs
data = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL]', data)
data = re.sub(r'\b\d{3}-\d{3}-\d{4}\b', '[PHONE]', data)
elif field_name == "metadata" and isinstance(data, dict):
# Remove specific metadata fields entirely
sensitive_keys = ["user_id", "session_token", "api_key"]
for key in sensitive_keys:
if key in data:
data[key] = "[REDACTED]"
return data
# Register the field-aware anonymizer
opik.hooks.add_anonymizer(FieldAwareAnonymizer())
```
anonymize() or a function rule, then return the
redacted data back to Opik.
## Creating an alert
### Prerequisites
* Access to the Opik Configuration page
* A webhook endpoint that can receive HTTP POST requests
* (Optional) An HTTPS endpoint with valid SSL certificate for production use
### Step-by-step guide
1. **Navigate to Alerts**
* Go to Configuration → Alerts tab
* Click "Create new alert" button
2. **Configure basic settings**
* **Name**: Give your alert a descriptive name (e.g., "Production Errors Slack")
* **Enable alert**: Toggle on to activate the alert immediately
3. **Configure webhook settings**
* **Destination**: Select the alert destination type:
* **General**: For custom webhooks, no-code automation platforms, or middleware services
* **Slack**: For native Slack webhook integration (automatically formats messages for Slack)
* **PagerDuty**: For native PagerDuty integration (automatically formats events for PagerDuty)
* **Endpoint URL**: Enter your webhook URL (must start with `http://` or `https://`)
* For Slack: Use your Slack Incoming Webhook URL (e.g., `https://hooks.slack.com/services/...`)
* For PagerDuty: Use your PagerDuty Events API v2 integration URL (e.g., `https://events.pagerduty.com/v2/enqueue`)
* For General: Use any HTTP endpoint that can receive POST requests
4. **Advanced webhook settings** (optional)
* **Secret token**: Add a secret token to verify webhook authenticity (recommended for General destination)
* **Custom headers**: Add HTTP headers for authentication or routing
* Example: `X-Custom-Auth: Bearer your-token-here`
5. **Add triggers**
* Click "Add trigger" to select event types
* Choose one or more event types from the list
* Configure project scope for observability events (optional)
* For threshold-based alerts (errors, cost, latency, feedback scores):
* **Threshold**: Set the threshold value that triggers the alert
* **Operator**: Choose comparison operator (`>`, `<`) for feedback score alerts
* **Window**: Configure the time window in seconds for metric aggregation
* **Feedback Score Name**: Select which feedback score to monitor (for feedback score alerts only)
6. **Test your configuration**
* Click "Test connection" to send a sample webhook
* Verify your endpoint receives the test payload
* Check the response status in the Opik UI
7. **Create the alert**
* Click "Create alert" to save your configuration
* The alert will start monitoring events immediately
## Integration examples
Opik supports three main approaches for integrating alerts with external systems:
1. **Native integrations** (Slack, PagerDuty): Use built-in formatting for popular services - no middleware required
2. **General webhooks**: Send alerts to custom endpoints, no-code platforms, or middleware services
3. **Middleware services** (Optional): Add custom logic, routing, or transformations before forwarding to destinations
### Slack integration (Native)
Opik provides native Slack integration that automatically formats alert messages for Slack's Block Kit format.
#### Prerequisites
* [Create a Slack app and enable Incoming Webhooks](https://docs.slack.dev/messaging/sending-messages-using-incoming-webhooks/)
* Generate a webhook URL (e.g., `https://hooks.slack.com/services/T00000000/B00000000/XXXX`)
#### Setup steps
1. **In Slack**:
* Create a Slack app in your workspace
* Enable Incoming Webhooks
* Add the webhook to your desired channel
* Copy the webhook URL
2. **In Opik**:
* Go to Configuration → Alerts tab
* Click "Create new alert"
* Give your alert a descriptive name
* Select **Slack** as the destination type
* Paste your Slack webhook URL in the Endpoint URL field
* Add triggers for the events you want to monitor
* Click "Test connection" to verify
* Click "Create alert"
Opik will automatically format all alert payloads into Slack-compatible messages with rich formatting, including:
* Alert name and event type
* Event count and details
* Relevant metadata
* Links to view full details in Opik
### PagerDuty integration (Native)
Opik provides native PagerDuty integration that automatically formats alert events for PagerDuty's Events API v2.
#### Prerequisites
* A PagerDuty account with permission to create integrations
* Access to a service where you want to receive alerts
#### Setup steps
1. **In PagerDuty**:
* Navigate to Services → select your service → Integrations tab
* Click "Add Integration"
* Select "Events API V2"
* Give the integration a name (e.g., "Opik Alerts")
* Save the integration and copy the Integration Key
2. **In Opik**:
* Go to Configuration → Alerts tab
* Click "Create new alert"
* Give your alert a descriptive name
* Select **PagerDuty** as the destination type
* Enter the PagerDuty Events API v2 endpoint: `https://events.pagerduty.com/v2/enqueue`
* In the **Routing Key** field, enter your PagerDuty Integration Key (this field appears when PagerDuty is selected as the destination)
* Add triggers for the events you want to monitor
* Click "Test connection" to verify
* Click "Create alert"
Opik will automatically format all alert payloads into PagerDuty-compatible events with:
* Severity levels based on event type
* Detailed event information
* Custom fields for filtering and routing
* Deduplication keys to prevent duplicate incidents
### Custom integration with middleware service (Optional)
For more complex integrations or custom formatting requirements, you can use a middleware service to transform Opik's payload before sending it to your destination. This approach works with any destination type (General, Slack, or PagerDuty).
#### When to use middleware
* **Custom message formatting**: Transform payload structure or add custom fields
* **Multi-destination routing**: Send alerts to different endpoints based on event type
* **Additional processing**: Enrich alerts with data from other systems
* **Legacy systems**: Adapt Opik alerts to older webhook formats
#### Example middleware for Slack with custom formatting
```python
import requests
def transform_to_slack(opik_payload):
event_type = opik_payload.get('eventType')
alert_name = opik_payload['payload']['alertName']
event_count = opik_payload['payload']['eventCount']
# Custom formatting logic
return {
"blocks": [
{
"type": "header",
"text": {
"type": "plain_text",
"text": f"🚨 {alert_name}"
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": f"*{event_count}* new `{event_type}` events"
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": f"View in Opik: https://www.comet.com/opik"
}
},
{
"type": "section",
"fields": [
{
"type": "mrkdwn",
"text": f"*Environment:*\nProduction"
},
{
"type": "mrkdwn",
"text": f"*Priority:*\nHigh"
}
]
}
]
}
@app.route('/opik-to-slack', methods=['POST'])
def opik_to_slack():
opik_data = request.json
slack_payload = transform_to_slack(opik_data)
# Forward to Slack
requests.post(
SLACK_WEBHOOK_URL,
json=slack_payload
)
return {'status': 'success'}, 200
```
#### Setup for middleware approach
1. Deploy your middleware service to a publicly accessible endpoint
2. In Opik, create an alert with destination type **General**
3. Use your middleware service URL as the Endpoint URL
4. Configure your middleware to forward to the final destination (Slack, PagerDuty, etc.)
### Using no-code automation platforms
No-code automation tools like [n8n](https://n8n.io), [Make.com](https://www.make.com), and [IFTTT](https://ifttt.com) provide an easy way to connect Opik alerts to other services—without writing or deploying code. These platforms can receive webhooks from Opik, apply filters or conditions, and trigger actions such as sending Slack messages, logging data in Google Sheets, or creating incidents in PagerDuty.
**To use them:**
1. **Create a new workflow or scenario** and add a **Webhook trigger** node/module
2. **Copy the webhook URL** generated by the platform
3. **In Opik**, create an alert with destination type **General** and paste the webhook URL from your automation platform
4. **Secure the connection** by validating the Authorization header or including a secret token parameter
5. **Add filters or routing logic** to handle different eventType values from Opik (for example, trace:errors or trace:feedback\_score)
6. **Chain the desired actions**, such as notifications, database updates, or analytics tracking
These tools also provide built-in monitoring, retries, and visual flow editors, making them suitable for both technical and non-technical users who want to automate Opik alert handling securely and efficiently. This approach works well when you need to route alerts to multiple destinations or apply complex business logic.
### Custom dashboard integration
Build a custom monitoring dashboard that receives alerts using the **General** destination type:
```python
from fastapi import FastAPI, Request
from datetime import datetime
app = FastAPI()
# In-memory storage (use a database in production)
alert_history = []
@app.post("/webhook")
async def receive_webhook(request: Request):
data = await request.json()
# Store alert
alert_history.append({
'timestamp': datetime.utcnow(),
'event_type': data.get('eventType'),
'alert_name': data['payload']['alertName'],
'event_count': data['payload']['eventCount'],
'data': data
})
# Keep only last 1000 alerts
if len(alert_history) > 1000:
alert_history.pop(0)
return {"status": "success"}
@app.get("/dashboard")
async def get_dashboard():
# Return aggregated statistics
return {
'total_alerts': len(alert_history),
'by_type': group_by_type(alert_history),
'recent_alerts': alert_history[-10:]
}
```
## Supported event types
Opik supports ten types of alert events:
### Observability events
**Trace errors threshold exceeded**
* **Event type**: `trace:errors`
* **Triggered when**: Total trace error count exceeds the specified threshold within a time window
* **Project scope**: Can be configured to specific projects
* **Configuration**: Requires threshold value (error count) and time window (in seconds)
* **Payload**: Metrics alert payload with error count details
* **Use case**: Proactive error monitoring, detect error spikes, prevent system degradation
**Trace feedback score threshold exceeded**
* **Event type**: `trace:feedback_score`
* **Triggered when**: Average trace feedback score meets the specified threshold criteria within a time window
* **Project scope**: Can be configured to specific projects
* **Configuration**: Requires feedback score name, threshold value, operator (`>`, `<`), and time window
* **Payload**: Metrics alert payload with average feedback score details
* **Use case**: Track model performance, monitor user satisfaction, detect quality degradation
**Thread feedback score threshold exceeded**
* **Event type**: `trace_thread:feedback_score`
* **Triggered when**: Average thread feedback score meets the specified threshold criteria within a time window
* **Project scope**: Can be configured to specific projects
* **Configuration**: Requires feedback score name, threshold value, operator (`>`, `<`), and time window
* **Payload**: Metrics alert payload with average feedback score details
* **Use case**: Monitor conversation quality, track multi-turn interactions, detect thread satisfaction issues
**Guardrails triggered**
* **Event type**: `trace:guardrails_triggered`
* **Triggered when**: A guardrail check fails for a trace
* **Project scope**: Can be configured to specific projects
* **Payload**: Array of guardrail result objects
* **Use case**: Security monitoring, compliance tracking, PII detection
**Cost threshold exceeded**
* **Event type**: `trace:cost`
* **Triggered when**: Total trace cost exceeds the specified threshold within a time window
* **Project scope**: Can be configured to specific projects
* **Configuration**: Requires threshold value (in currency units) and time window (in seconds)
* **Payload**: Metrics alert payload with cost details
* **Use case**: Budget monitoring, cost control, prevent runaway spending
**Latency threshold exceeded**
* **Event type**: `trace:latency`
* **Triggered when**: Average trace latency exceeds the specified threshold within a time window
* **Project scope**: Can be configured to specific projects
* **Configuration**: Requires threshold value (in seconds) and time window (in seconds)
* **Payload**: Metrics alert payload with latency details
* **Use case**: Performance monitoring, SLA compliance, user experience tracking
### Prompt engineering events
**New prompt added**
* **Event type**: `prompt:created`
* **Triggered when**: A new prompt is created in the prompt library
* **Project scope**: Workspace-wide
* **Payload**: Prompt object with metadata
* **Use case**: Track prompt library changes, audit prompt creation
**New prompt version created**
* **Event type**: `prompt:committed`
* **Triggered when**: A new version (commit) is added to a prompt
* **Project scope**: Workspace-wide
* **Payload**: Prompt version object with template and metadata
* **Use case**: Monitor prompt iterations, track version history
**Prompt deleted**
* **Event type**: `prompt:deleted`
* **Triggered when**: A prompt is removed from the prompt library
* **Project scope**: Workspace-wide
* **Payload**: Array of deleted prompt objects
* **Use case**: Audit prompt deletions, maintain prompt governance
### Evaluation events
**Experiment finished**
* **Event type**: `experiment:finished`
* **Triggered when**: An experiment completes in the workspace
* **Project scope**: Workspace-wide
* **Payload**: Array of experiment objects with completion details
* **Use case**: Automate experiment notifications, track evaluation completions
### Want us to support more event types?
If you need additional event types for your use case, please [create an issue on GitHub](https://github.com/comet-ml/opik/issues/new?title=Alert%20Event%20Request%3A%20%3Cevent-name%3E\&labels=enhancement) and let us know what you'd like to monitor.
## Webhook payload structure
All webhook events follow a consistent payload structure:
```json
{
"id": "webhook-event-id",
"eventType": "trace:errors",
"alertId": "alert-uuid",
"alertName": "Production Errors Alert",
"workspaceId": "workspace-uuid",
"createdAt": "2025-01-15T10:30:00Z",
"payload": {
"alertId": "alert-uuid",
"alertName": "Production Errors Alert",
"eventType": "trace:errors",
"eventIds": ["event-id-1", "event-id-2"],
"userNames": ["user@example.com"],
"eventCount": 2,
"aggregationType": "consolidated",
"message": "Alert 'Production Errors Alert': 2 trace:errors events aggregated",
"metadata": [
{
"id": "trace-uuid",
"name": "handle_query",
"project_id": "project-uuid",
"project_name": "Demo Project",
"start_time": "2025-01-15T10:29:45Z",
"end_time": "2025-01-15T10:29:50Z",
"input": {
"query": "User question"
},
"output": {
"response": "LLM response"
},
"error_info": {
"exception_type": "ValidationException",
"message": "Validation failed",
"traceback": "Full traceback..."
},
"metadata": {
"customer_id": "customer_123"
},
"tags": ["production"]
}
]
}
}
```
### Payload fields
| Field | Type | Description |
| ------------------------- | ----------------- | ------------------------------------------ |
| `id` | string | Unique webhook event identifier |
| `eventType` | string | Type of event (e.g., `trace:errors`) |
| `alertId` | string (UUID) | Alert configuration identifier |
| `alertName` | string | Name of the alert |
| `workspaceId` | string | Workspace identifier |
| `createdAt` | string (ISO 8601) | Timestamp when webhook was created |
| `payload.eventIds` | array | List of aggregated event IDs |
| `payload.userNames` | array | Users associated with the events |
| `payload.eventCount` | number | Number of aggregated events |
| `payload.aggregationType` | string | Always "consolidated" |
| `payload.metadata` | array | Event-specific data (varies by event type) |
## Event-specific payloads
### Trace errors threshold exceeded payload
```json
{
"metadata": {
"event_type": "TRACE_ERRORS",
"metric_name": "trace:errors",
"metric_value": "15",
"threshold": "10",
"window_seconds": "900",
"project_ids": "0198ec68-6e06-7253-a20b-d35c9252b9ba,0198ec68-6e06-7253-a20b-d35c9252b9bb",
"project_names": "Demo Project,Default Project"
}
}
```
### Trace feedback score threshold exceeded payload
```json
{
"metadata": {
"event_type": "TRACE_FEEDBACK_SCORE",
"metric_name": "trace:feedback_score",
"metric_value": "0.7500",
"threshold": "0.8000",
"window_seconds": "3600",
"project_ids": "0198ec68-6e06-7253-a20b-d35c9252b9ba,0198ec68-6e06-7253-a20b-d35c9252b9bb",
"project_names": "Demo Project,Default Project"
}
}
```
### Thread feedback score threshold exceeded payload
```json
{
"metadata": {
"event_type": "TRACE_THREAD_FEEDBACK_SCORE",
"metric_name": "trace_thread:feedback_score",
"metric_value": "0.7500",
"threshold": "0.8000",
"window_seconds": "3600",
"project_ids": "0198ec68-6e06-7253-a20b-d35c9252b9ba,0198ec68-6e06-7253-a20b-d35c9252b9bb",
"project_names": "Demo Project,Default Project"
}
}
```
### Prompt created payload
```json
{
"metadata": {
"id": "prompt-uuid",
"name": "Prompt Name",
"description": "Prompt description",
"tags": ["system", "assistant"],
"created_at": "2025-01-15T10:00:00Z",
"created_by": "user@example.com",
"last_updated_at": "2025-01-15T10:00:00Z",
"last_updated_by": "user@example.com"
}
}
```
### Prompt version created payload
```json
{
"metadata": {
"id": "version-uuid",
"prompt_id": "prompt-uuid",
"commit": "abc12345",
"template": "You are a helpful assistant. {{question}}",
"type": "mustache",
"metadata": {
"version": "1.0",
"model": "gpt-4"
},
"created_at": "2025-01-15T10:00:00Z",
"created_by": "user@example.com"
}
}
```
### Prompt deleted payload
```json
{
"metadata": [
{
"id": "prompt-uuid",
"name": "Prompt Name",
"description": "Prompt description",
"tags": ["deprecated"],
"created_at": "2025-01-10T10:00:00Z",
"created_by": "user@example.com",
"last_updated_at": "2025-01-15T10:00:00Z",
"last_updated_by": "user@example.com",
"latest_version": {
"id": "version-uuid",
"commit": "abc12345",
"template": "Template content",
"type": "mustache",
"created_at": "2025-01-15T10:00:00Z",
"created_by": "user@example.com"
}
}
]
}
```
### Guardrails triggered payload
```json
{
"metadata": [
{
"id": "guardrail-check-uuid",
"entity_id": "trace-uuid",
"project_id": "project-uuid",
"project_name": "Project Name",
"name": "PII",
"result": "failed",
"details": {
"detected_entities": ["EMAIL", "PHONE_NUMBER"],
"message": "PII detected in response: email and phone number"
}
}
]
}
```
### Experiment finished payload
```json
{
"metadata": [
{
"id": "experiment-uuid",
"name": "Experiment Name",
"dataset_id": "dataset-uuid",
"created_at": "2025-01-15T10:00:00Z",
"created_by": "user@example.com",
"last_updated_at": "2025-01-15T10:05:00Z",
"last_updated_by": "user@example.com",
"feedback_scores": [
{
"name": "accuracy",
"value": 0.92
},
{
"name": "latency",
"value": 1.5
}
]
}
]
}
```
### Cost threshold exceeded payload
```json
{
"metadata": {
"event_type": "TRACE_COST",
"metric_name": "trace:cost",
"metric_value": "150.75",
"threshold": "100.00",
"window_seconds": "3600",
"project_ids": "0198ec68-6e06-7253-a20b-d35c9252b9ba,0198ec68-6e06-7253-a20b-d35c9252b9bb",
"project_names": "Demo Project,Default Project"
}
}
```
### Latency threshold exceeded payload
```json
{
"metadata": {
"event_type": "TRACE_LATENCY",
"metric_name": "trace:latency",
"metric_value": "5250.5000",
"threshold": "5",
"window_seconds": "1800",
"project_ids": "0198ec68-6e06-7253-a20b-d35c9252b9ba,0198ec68-6e06-7253-a20b-d35c9252b9bb",
"project_names": "Demo Project,Default Project"
}
}
```
## Securing your webhooks
### Using secret tokens
Add a secret token to your webhook configuration to verify that incoming requests are from Opik:
1. Generate a secure random token (e.g., using `openssl rand -hex 32`)
2. Add it to your alert's "Secret token" field
3. Opik will send it in the `Authorization` header: `Authorization: Bearer your-secret-token`
4. Validate the token in your webhook handler before processing the request
### Example validation (Python/Flask)
```python
from flask import Flask, request, abort
import hmac
app = Flask(__name__)
SECRET_TOKEN = "your-secret-token-here"
@app.route('/webhook', methods=['POST'])
def handle_webhook():
# Verify the secret token
auth_header = request.headers.get('Authorization', '')
if not auth_header.startswith('Bearer '):
abort(401, 'Missing or invalid Authorization header')
token = auth_header.split(' ', 1)[1]
if not hmac.compare_digest(token, SECRET_TOKEN):
abort(401, 'Invalid secret token')
# Process the webhook
data = request.json
event_type = data.get('eventType')
# Handle different event types
if event_type == 'trace:errors':
handle_trace_errors(data)
elif event_type == 'trace:feedback_score':
handle_feedback_score(data)
elif event_type == 'experiment:finished':
handle_experiment_finished(data)
return {'status': 'success'}, 200
```
### Using custom headers
You can add custom headers for additional authentication or routing:
```python
# In your webhook handler
api_key = request.headers.get('X-API-Key')
environment = request.headers.get('X-Environment')
if api_key != EXPECTED_API_KEY:
abort(401, 'Invalid API key')
# Route to different handlers based on environment
if environment == 'production':
handle_production_webhook(data)
else:
handle_staging_webhook(data)
```
## Troubleshooting
### Webhooks not being delivered
**Check endpoint accessibility:**
* Ensure your endpoint is publicly accessible (if using cloud)
* Verify firewall rules allow incoming connections
* Test your endpoint with curl: `curl -X POST -H "Content-Type: application/json" -d '{"test": "data"}' https://your-endpoint.com/webhook`
**Check webhook configuration:**
* Verify the URL starts with `http://` or `https://`
* Check that the endpoint returns 2xx status codes
* Review custom headers for syntax errors
**Check alert status:**
* Ensure the alert is enabled
* Verify at least one trigger is configured
* Check that project scope matches your events (for observability events)
### Webhook timeouts
Opik expects webhooks to respond within the configured timeout (typically 30 seconds). If your endpoint takes longer:
**Optimize your handler:**
* Return a 200 response immediately
* Process the webhook asynchronously in the background
* Use a queue system (e.g., Celery, RabbitMQ) for long-running tasks
**Example async processing:**
```python
from flask import Flask
from threading import Thread
app = Flask(__name__)
def process_webhook_async(data):
# Long-running processing
send_to_slack(data)
update_dashboard(data)
log_to_database(data)
@app.route('/webhook', methods=['POST'])
def handle_webhook():
data = request.json
# Start background processing
thread = Thread(target=process_webhook_async, args=(data,))
thread.start()
# Return immediately
return {'status': 'accepted'}, 200
```
### Duplicate webhooks
If you receive duplicate webhooks:
**Check retry configuration:**
* Opik retries failed webhooks with exponential backoff
* Ensure your endpoint returns 2xx status codes on success
* Implement idempotency using the webhook `id` field
**Example idempotent handler:**
```python
processed_webhook_ids = set()
@app.route('/webhook', methods=['POST'])
def handle_webhook():
data = request.json
webhook_id = data.get('id')
# Skip if already processed
if webhook_id in processed_webhook_ids:
return {'status': 'already_processed'}, 200
# Process webhook
process_alert(data)
# Mark as processed
processed_webhook_ids.add(webhook_id)
return {'status': 'success'}, 200
```
### Events not triggering alerts
**Check event type matching:**
* Verify the alert has a trigger for this event type
* For observability events, check project scope configuration
* Review project IDs in trigger configuration
**Check workspace context:**
* Ensure events are logged to the correct workspace
* Verify the alert is in the same workspace as your events
**Check alert evaluation:**
* View backend logs for alert evaluation messages
* Confirm events are being published to the event bus
* Check Redis for alert buckets (self-hosted deployments)
### SSL certificate errors
If you see SSL certificate errors in logs:
**For development/testing:**
* Use self-signed certificates with proper configuration
* Or use HTTP endpoints (not recommended for production)
**For production:**
* Use valid SSL certificates from trusted CAs
* Ensure certificate chain is complete
* Check certificate expiry dates
* Use services like Let's Encrypt for free SSL
## Architecture and internals
Understanding Opik's alert architecture can help with troubleshooting and optimization.
### How alerts work
The Opik Alerts system monitors your workspace for specific events and sends consolidated webhook notifications to your configured endpoints. Here's the flow:
1. **Event occurs**: An event happens in your workspace (e.g., a trace error, prompt creation, guardrail trigger, new feedback score)
2. **Alert evaluation**: The system checks if any enabled alerts match this event type and evaluates threshold conditions (for metrics-based alerts like errors, cost, latency, and feedback scores)
3. **Event aggregation**: Multiple events are aggregated over a short time window (debouncing)
4. **Webhook delivery**: A consolidated HTTP POST request is sent to your webhook URL
5. **Retry handling**: Failed requests are automatically retried with exponential backoff
#### Event debouncing
To prevent overwhelming your webhook endpoint, Opik aggregates multiple events of the same type within a short time window (typically 30-60 seconds) and sends them as a single consolidated webhook. This is particularly useful for high-frequency events like feedback scores.
### Event flow
```
1. Event occurs (e.g., trace error logged)
↓
2. Service publishes AlertEvent to EventBus
↓
3. AlertEventListener receives event
↓
4. AlertEventEvaluationService evaluates against configured alerts
↓
5. Matching events added to AlertBucketService (Redis)
↓
6. AlertJob (runs every 5 seconds) processes ready buckets
↓
7. WebhookPublisher publishes to Redis stream
↓
8. WebhookSubscriber consumes from stream
↓
9. WebhookHttpClient sends HTTP POST request
↓
10. Retries on failure with exponential backoff
```
### Debouncing mechanism
Opik uses Redis-based buckets to aggregate events:
* **Bucket key format**: `alert_bucket:{alertId}:{eventType}`
* **Window size**: Configurable (default 30-60 seconds)
* **Index**: Redis Sorted Set for efficient bucket retrieval
* **TTL**: Buckets expire automatically after processing
This prevents overwhelming your webhook endpoint with individual events and reduces costs for high-frequency events.
### Retry strategy
Failed webhooks are automatically retried:
* **Max retries**: Configurable (default 3)
* **Initial delay**: 1 second
* **Max delay**: 60 seconds
* **Backoff**: Exponential with jitter
* **Retryable errors**: 5xx status codes, network errors
* **Non-retryable errors**: 4xx status codes (except 429)
## Best practices
### Alert design
**Create focused alerts:**
* Use separate alerts for different purposes (e.g., one for errors, one for feedback)
* Configure project scope to avoid noise from test projects
* Use descriptive names that explain the alert's purpose
**Optimize for your workflow:**
* Send critical errors to PagerDuty or on-call systems
* Route feedback scores to analytics platforms
* Send prompt changes to audit logs or Slack channels
**Test thoroughly:**
* Use the "Test connection" feature before enabling alerts
* Monitor webhook delivery in your endpoint logs
* Start with a small project scope and expand gradually
### Webhook endpoint design
**Handle failures gracefully:**
* Return 2xx status codes immediately
* Process webhooks asynchronously
* Implement retry logic in your handler
* Use dead letter queues for permanent failures
**Implement security:**
* Always validate secret tokens
* Use HTTPS endpoints with valid certificates
* Implement rate limiting to prevent abuse
* Log all webhook attempts for auditing
**Monitor performance:**
* Track webhook processing time
* Alert on handler failures
* Monitor queue lengths for async processing
* Set up dead letter queue monitoring
### Scaling considerations
**For high-volume workspaces:**
* Use event debouncing (built-in)
* Implement batch processing in your handler
* Use message queues for async processing
* Consider using serverless functions (AWS Lambda, Cloud Functions)
**For multiple projects:**
* Create project-specific alerts with scope configuration
* Use custom headers to route to different handlers
* Implement filtering in your webhook handler
* Consider separate endpoints for different event types
## Next steps
* Configure your first alert for production error monitoring
* Set up Slack integration for team notifications
* Explore [Online Evaluation Rules](/production/online-evaluation/rules) for automated model monitoring
* Learn about [Guardrails](/production/gateway-guardrails/guardrails) for proactive risk detection
* Review [Production Monitoring](/tracing/dashboards/production_monitoring) best practices
***
headline: Administration Overview | Opik Documentation
og:description: Learn how to manage users, workspaces, roles, and authentication settings in Opik for your organization.
og:site_name: Opik Documentation
og:title: Opik Administration: Managing Users, Workspaces, and Permissions
subtitle: Multi-user collaboration with enterprise-grade access control
title: Overview
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
Opik Cloud and Enterprise include administration features for teams and organizations, including:
* **Role-based access control**: Assign granular permissions at the organization and workspace level
* **Single sign-on (SSO)**: Authenticate users via SAML or OIDC with your identity provider
* **Workspace isolation**: Separate projects and data across teams with independent access controls
* **Service accounts**: Create API keys for CI/CD pipelines and automated workflows
* **User management**: Invite team members, assign roles, and manage access from a central dashboard
* **JWT authentication**: Integrate Opik into existing systems with token-based auth
## Who can access Workspace Settings?
* **Workspace owners** can access all settings including member management and preferences
* **Workspace members** with appropriate roles can access feedback definitions and AI providers
1. Click **Create new feedback definition**
2. Enter a **Name** for your feedback definition
3. Select the **Type**: Categorical or Numerical
4. Define the **Values** (for Categorical) or **Range** (for Numerical)
## Common Feedback Types
| Feedback Type | Type | Values |
| ---------------- | ----------- | --------------------------- |
| Thumbs Up / Down | Categorical | thumbs up, thumbs down |
| Usefulness | Categorical | Useful, Neutral, Not useful |
| Hallucination | Categorical | Yes, No |
| Correct | Categorical | Good, Bad |
***
description: Configure connections to Large Language Model providers
headline: AI Providers | Opik Documentation
og:description: Configure connections to various LLMs in Opik, manage integrations, and optimize project workflows with ease.
og:site_name: Opik Documentation
og:title: Configure AI Providers in Opik
title: AI Providers
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
AI Providers let you connect LLMs for use in the Playground and Online Evaluation.
## Using AI Providers
Once configured, providers appear in two places:
1. **Playground** — Test prompts interactively with different models.
2. **Online Evaluation** — Run LLM-as-a-judge scoring on your traces.
## Adding a Provider
1. Click the **Add configuration** button in the top-right corner
2. In the Provider Configuration dialog that appears:
* Select a provider from the dropdown menu
* Enter your API key for that provider
* Click **Save** to store the configuration
### Supported Providers
Opik supports integration with various AI providers, including:
* OpenAI
* Anthropic
* OpenRouter
* Gemini
* VertexAI
* Azure OpenAI
* Amazon Bedrock
* Ollama (local or self-hosted, OpenAI-compatible)
* vLLM / any other OpenAI API-compliant provider
##### Configuration Steps
1. **Provider Name**: Enter a unique name to identify this custom provider (e.g., "vLLM Production", "Ollama Local", "Azure OpenAI Dev")
2. **URL**: Enter your server URL, for example: `http://host.docker.internal:8000/v1`
3. **API Key** (optional): If your model access requires authentication, enter the API key. Otherwise, leave this field blank.
4. **Models**: List all models available on your server. You'll be able to select one of them for use later.
5. **Custom Headers** (optional): Add any additional HTTP headers required by your custom endpoint as key-value pairs.
## Dashboard sections
The Admin Dashboard provides access to the following sections:
### Workspaces
Manage the workspaces within your organization:
* **View all workspaces**: See a list of all workspaces in your organization.
* **Create workspaces**: Add new workspaces for different teams or projects.
* **Delete workspaces**: Remove workspaces that are no longer needed.
Learn more in [Workspaces](/administration/admin-dashboard/workspaces).
### Users
Manage user access to your organization:
* **View members**: See all users in your organization and their roles.
* **View pending invitations**: See pending invitations to your organization.
* **Remove users**: Revoke access for users who should no longer be in the organization.
* **Change roles**: Update user roles at the organization level.
Learn more in [Users](/administration/admin-dashboard/users).
### Roles & Permissions
Configure workspace-level access control:
* **View workspace roles**: See the available roles and their permissions.
* **Create custom roles**: Define custom roles tailored to your organization's needs.
* **Manage permissions**: Understand and configure the permission hierarchy.
Learn more in [Roles and Permissions](/administration/roles_and_permissions).
### Authentication
Set up single sign-on authentication for your organization:
* **SAML configuration**: Configure SAML-based SSO with your identity provider.
* **OIDC configuration**: Set up OpenID Connect authentication.
* **JWT authentication**: Configure JWT token-based authentication.
Learn more in [Authentication Overview](/administration/authentication/overview).
### Service Accounts
Manage programmatic access to Opik:
* **Create service accounts**: Set up accounts for automated systems and CI/CD pipelines.
* **Manage API keys**: Generate, regenerate, and revoke API keys.
* **Configure workspace access**: Define which workspaces each service account can access.
Learn more in [Service Accounts](/administration/admin-dashboard/service_accounts).
### Billing
## Getting started
To use the BeeAI integration with Opik, you will need to have BeeAI and the required OpenTelemetry packages installed.
### Installation
#### Option 1: Using npm
```bash
npm install beeai-framework@0.1.13 @ai-sdk/openai @arizeai/openinference-instrumentation-beeai @opentelemetry/sdk-node dotenv
```
#### Option 2: Using yarn
```bash
yarn add beeai-framework@0.1.13 @ai-sdk/openai @arizeai/openinference-instrumentation-beeai @opentelemetry/sdk-node dotenv
```
## Getting started
### Create a Mastra project
If you don't have a Mastra project yet, you can create one using the Mastra CLI:
```bash
npx create-mastra
cd your-mastra-project
```
### Install required packages
Install the necessary dependencies for Mastra observability:
```bash
npm install @mastra/observability @mastra/otel
```
### Add environment variables
Create or update your `.env` file with the following variables:
## Getting started
To use the AG2 integration with Opik, you will need to have the following
packages installed:
```bash
pip install -U "ag2[openai]" opik opentelemetry-sdk opentelemetry-instrumentation-openai opentelemetry-instrumentation-threading opentelemetry-exporter-otlp
```
In addition, you will need to set the following environment variables to
configure the OpenTelemetry integration:
## Getting started
To use the Agno integration with Opik, you will need to have the following
packages installed:
```bash
pip install -U agno openai opentelemetry-sdk opentelemetry-exporter-otlp openinference-instrumentation-agno yfinance
```
In addition, you will need to set the following environment variables to
configure the OpenTelemetry integration:
## Getting started
To use the BeeAI integration with Opik, you will need to have BeeAI and the required OpenTelemetry packages installed:
```bash
pip install beeai-framework openinference-instrumentation-beeai "beeai-framework[wikipedia]" opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
```
## Environment configuration
Configure your environment variables based on your Opik deployment:
## Getting started
To use the Autogen integration with Opik, you will need to have the following
packages installed:
```bash
pip install -U "autogen-agentchat" "autogen-ext[openai]" opik opentelemetry-sdk opentelemetry-instrumentation-openai opentelemetry-exporter-otlp
```
In addition, you will need to set the following environment variables to
configure the OpenTelemetry integration:
## Getting Started
### Installation
First, ensure you have both `opik` and `crewai` installed:
```bash
pip install opik crewai crewai-tools
```
### Configuring Opik
Configure the Opik Python SDK for your deployment type. See the [Python SDK Configuration guide](/tracing/advanced/sdk_configuration) for detailed instructions on:
* **CLI configuration**: `opik configure`
* **Code configuration**: `opik.configure()`
* **Self-hosted vs Cloud vs Enterprise** setup
* **Configuration files** and environment variables
### Configuring CrewAI
In order to configure CrewAI, you will need to have your LLM provider API key. For this example, we'll use OpenAI. You can [find or create your OpenAI API Key in this page](https://platform.openai.com/settings/organization/api-keys).
You can set it as an environment variable:
```bash
export OPENAI_API_KEY="YOUR_API_KEY"
```
Or set it programmatically:
```python
import os
import getpass
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
```
## Logging CrewAI calls
To log a CrewAI pipeline run, you can use the [`track_crewai`](https://www.comet.com/docs/opik/python-sdk-reference/integrations/crewai/track_crewai.html) function. This will log each CrewAI call to Opik, including LLM calls made by your agents.
If you set `log_graph` to `True` in the `OpikCallback`, then each module graph is also displayed in the "Agent graph" tab:
***
description: Start here to integrate Opik into your Google Agent Development Kit-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: Google ADK | Opik Documentation
og:description: Build flexible AI agents using Google ADK, integrating easily with Gemini models and Google AI tools for both simple and complex architectures.
og:site_name: Opik Documentation
og:title: Develop AI Agents with Google ADK - Opik
title: Observability for Google Agent Development Kit (Python) with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
This approach automatically handles:
* **All agent callbacks** (before/after agent, model, and tool executions)
* **Sub-agents** and nested agent hierarchies
* **Agent tools** that contain other agents
* **Complex workflows** with minimal code
## Example 2: Manual Callback Configuration (Alternative Approach)
For a fine-grained control over which callbacks to instrument, you can manually configure the [`OpikTracer`](https://www.comet.com/docs/opik/python-sdk-reference/integrations/adk/OpikTracer.html) callbacks. This approach gives you explicit control but requires more setup code:
```python
# Configure Opik tracer (same as before)
opik_tracer = OpikTracer(
name="basic-weather-agent",
tags=["basic", "weather", "time", "single-agent"],
metadata={
"environment": "development",
"model": "gpt-4o",
"framework": "google-adk",
"example": "basic"
},
project_name="adk-basic-demo"
)
# Create the agent with explicit callback configuration
basic_agent = LlmAgent(
name="weather_time_agent",
model=llm,
description="Agent for answering time & weather questions",
instruction="Answer questions about the time or weather in a city. Be helpful and provide clear information.",
tools=[get_weather, get_current_time],
before_agent_callback=opik_tracer.before_agent_callback,
after_agent_callback=opik_tracer.after_agent_callback,
before_model_callback=opik_tracer.before_model_callback,
after_model_callback=opik_tracer.after_model_callback,
before_tool_callback=opik_tracer.before_tool_callback,
after_tool_callback=opik_tracer.after_tool_callback,
)
```
The `track_adk_agent_recursive` approach is particularly powerful for:
* **Multi-agent systems** with coordinator and specialist agents
* **Sequential agents** with multiple processing steps
* **Parallel agents** executing tasks concurrently
* **Loop agents** with iterative workflows
* **Agent tools** that contain nested agents
* **Complex hierarchies** with deeply nested agent structures
By calling `track_adk_agent_recursive` once on the top-level agent, all child agents and their operations are automatically instrumented without any additional code
## Cost Tracking
Opik automatically tracks token usage and cost for all LLM calls during the agent execution, not only for the Gemini LLMs, but including the models accessed via `LiteLLM`.
For more complex agent architectures displaying a graph may be even more beneficial:
## Example 4: Hybrid Tracing - Combining Opik Decorators with ADK Callbacks
This advanced example shows how to combine Opik's `@opik.track` decorator with ADK's callback system. This is powerful when you have complex multi-step tools that perform their own internal operations that you want to trace separately, while still maintaining the overall agent trace context.
You can use `track_adk_agent_recursive` together with `@opik.track` decorators on your tool functions for maximum visibility:
```python
from opik import track
@track(name="weather_data_processing", tags=["data-processing", "weather"])
def process_weather_data(raw_data: dict) -> dict:
"""Process raw weather data with additional computations."""
# Simulate some data processing steps that we want to trace separately
processed = {
"temperature_celsius": raw_data.get("temp_c", 0),
"temperature_fahrenheit": raw_data.get("temp_c", 0) * 9/5 + 32,
"conditions": raw_data.get("condition", "unknown"),
"comfort_index": "comfortable" if 18 <= raw_data.get("temp_c", 0) <= 25 else "less comfortable"
}
return processed
@track(name="location_validation", tags=["validation", "location"])
def validate_location(city: str) -> dict:
"""Validate and normalize city names."""
# Simulate location validation logic that we want to trace
normalized_cities = {
"nyc": "New York",
"ny": "New York",
"new york city": "New York",
"london uk": "London",
"london england": "London",
"tokyo japan": "Tokyo"
}
city_lower = city.lower().strip()
validated_city = normalized_cities.get(city_lower, city.title())
return {
"original": city,
"validated": validated_city,
"is_valid": city_lower in ["new york", "london", "tokyo"] or city_lower in normalized_cities
}
@track(name="advanced_weather_lookup", tags=["weather", "api-simulation"])
def get_advanced_weather(city: str) -> dict:
"""Get weather with internal processing steps tracked by Opik decorators."""
# Step 1: Validate location (traced by @opik.track)
location_result = validate_location(city)
if not location_result["is_valid"]:
return {
"status": "error",
"error_message": f"Invalid location: {city}"
}
validated_city = location_result["validated"]
# Step 2: Get raw weather data (simulated)
raw_weather_data = {
"New York": {"temp_c": 25, "condition": "sunny", "humidity": 65},
"London": {"temp_c": 18, "condition": "cloudy", "humidity": 78},
"Tokyo": {"temp_c": 22, "condition": "partly cloudy", "humidity": 70}
}
if validated_city not in raw_weather_data:
return {
"status": "error",
"error_message": f"Weather data unavailable for {validated_city}"
}
raw_data = raw_weather_data[validated_city]
# Step 3: Process the data (traced by @opik.track)
processed_data = process_weather_data(raw_data)
return {
"status": "success",
"city": validated_city,
"report": f"Weather in {validated_city}: {processed_data['conditions']}, {processed_data['temperature_celsius']}°C ({processed_data['temperature_fahrenheit']:.1f}°F). Comfort level: {processed_data['comfort_index']}.",
"raw_humidity": raw_data["humidity"]
}
# Configure Opik tracer for hybrid example
hybrid_tracer = OpikTracer(
name="hybrid-tracing-agent",
tags=["hybrid", "decorators", "callbacks", "advanced"],
metadata={
"environment": "development",
"model": "gpt-4o",
"framework": "google-adk",
"example": "hybrid-tracing",
"tracing_methods": ["decorators", "callbacks"]
},
project_name="adk-hybrid-demo"
)
# Create hybrid agent that combines both tracing approaches
hybrid_agent = LlmAgent(
name="advanced_weather_time_agent",
model=llm,
description="Advanced agent with hybrid Opik tracing using both decorators and callbacks",
instruction="""You are an advanced weather and time agent that provides detailed information with comprehensive internal processing.
Your tools perform multi-step operations that are individually traced, giving detailed visibility into the processing pipeline.
Use the advanced weather and time tools to provide thorough, well-processed information to users.""",
tools=[get_advanced_weather],
)
# Instrument the agent with track_adk_agent_recursive
# The @opik.track decorators in your tools will automatically create child spans
from opik.integrations.adk import track_adk_agent_recursive
track_adk_agent_recursive(hybrid_agent, hybrid_tracer)
```
The trace can now be viewed in the UI:
## Compatibility with @track Decorator
The `OpikTracer` is fully compatible with the `@track` decorator, allowing you to create hybrid tracing approaches that combine ADK agent tracking with custom function tracing.
You can both invoke your agent from inside another tracked function and call tracked functions inside your tool functions, all the spans and traces parent-child relationships will be preserved!
## Thread Support
The Opik integration automatically handles ADK sessions and maps them to Opik threads for conversational applications:
```python
from opik.integrations.adk import OpikTracer
from google.adk import sessions as adk_sessions, runners as adk_runners
# ADK session management
session_service = adk_sessions.InMemorySessionService()
session = session_service.create_session_sync(
app_name="my_app",
user_id="user_123",
session_id="conversation_456"
)
opik_tracer = OpikTracer()
runner = adk_runners.Runner(
agent=your_agent,
app_name="my_app",
session_service=session_service
)
# All traces will be automatically grouped by session_id as thread_id
```
The integration automatically:
* Uses the ADK session ID as the Opik thread ID
* Groups related conversations and interactions
* Logs app\_name and user\_id as metadata
* Maintains conversation context across multiple interactions
You can view your session as a whole conversation and easily navigate to any specific trace you need.
## Error Tracking
The `OpikTracer` provides comprehensive error tracking and monitoring:
* **Automatic error capture** for agent execution failures
* **Detailed stack traces** with full context information
* **Tool execution errors** with input/output data
* **Model call failures** with provider-specific error details
Error information is automatically logged to spans and traces, making it easy to debug issues in production:
## Troubleshooting: Missing Trace
When using `Runner.run_async`, make sure to process all events completely, even after finding the final response (when `event.is_final_response()` is `True`). If you exit the loop too early, OpikTracer won't log the final response and your trace will be incomplete. Don't use code that stops processing events prematurely:
```python
async for event in runner.run_async(user_id=user_id, session_id=session_id, new_message=content):
if event.is_final_response():
...
break # Stop processing events once the final response is found
```
There is an upstream discussion about how to best solve this source of confusion: [https://github.com/google/adk-python/issues/1695](https://github.com/google/adk-python/issues/1695).
## Cost Tracking
The `OpikConnector` automatically tracks token usage and cost for all supported LLM models used within Haystack pipelines.
Cost information is automatically captured and displayed in the Opik UI, including:
* Token usage details
* Cost per request based on model pricing
* Total trace cost
Opik integrates with Harbor to log traces for all trial executions, including:
* **Trial results** as Opik traces with timing, metadata, and feedback scores from verifier rewards
* **Trajectory steps** as nested spans showing the complete agent-environment interaction
* **Tool calls and observations** as detailed execution records
* **Token usage and costs** aggregated from ATIF metrics
## Account Setup
[Comet](https://www.comet.com/site?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=harbor\&utm_campaign=opik) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=harbor\&utm_campaign=opik) and grab your API Key.
> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=harbor\&utm_campaign=opik) for more information.
## Getting Started
### Installation
First, ensure you have both `opik` and `harbor` installed:
```bash
pip install opik harbor
```
### Configuring Opik
Configure the Opik Python SDK for your deployment type. See the [Python SDK Configuration guide](/tracing/advanced/sdk_configuration) for detailed instructions on:
* **CLI configuration**: `opik configure`
* **Code configuration**: `opik.configure()`
* **Self-hosted vs Cloud vs Enterprise** setup
* **Configuration files** and environment variables
### Configuring Harbor
Harbor requires configuration for the agent and benchmark you want to evaluate. Refer to the [Harbor documentation](https://github.com/laude-institute/harbor) for details on setting up your job configuration.
## Using the CLI
The easiest way to use Harbor with Opik is through the `opik harbor` CLI command. This automatically enables Opik tracking for all trial executions without modifying your code.
### Basic Usage
```bash
# Run a benchmark with Opik tracking
opik harbor run -d terminal-bench@head -a terminus_2 -m gpt-4.1
# Use a configuration file
opik harbor run -c config.yaml
```
### Specifying Project Name
```bash
# Set project name via environment variable
export OPIK_PROJECT_NAME=my-benchmark
opik harbor run -d swebench@lite
```
### Available CLI Commands
All Harbor CLI commands are available as subcommands:
```bash
# Run a job (alias for jobs start)
opik harbor run [HARBOR_OPTIONS]
# Job management
opik harbor jobs start [HARBOR_OPTIONS]
opik harbor jobs resume -p ./jobs/my-job
# Single trial
opik harbor trials start -p ./my-task -a terminus_2
```
### CLI Help
```bash
# View available options
opik harbor --help
opik harbor run --help
```
## Example: SWE-bench Evaluation
Here's a complete example running a SWE-bench evaluation with Opik tracking:
```bash
# Configure Opik
opik configure
# Set project name
export OPIK_PROJECT_NAME=swebench-claude-sonnet
# Run SWE-bench evaluation with tracking
opik harbor run \
-d swebench-lite@head \
-a claude-code \
-m claude-3-5-sonnet-20241022
```
## Custom Agents
Harbor supports integrating your own custom agents without modifying the Harbor source code. There are two types of agents you can create:
* **External agents** - Interface with the environment through the `BaseEnvironment` interface, typically by executing bash commands
* **Installed agents** - Installed directly into the container environment and executed in headless mode
For details on implementing custom agents, see the [Harbor Agents documentation](https://harborframework.com/docs/agents).
### Running Custom Agents with Opik
To run a custom agent with Opik tracking, use the `--agent-import-path` flag:
```bash
opik harbor run -d "terminal-bench@head" --agent-import-path path.to.agent:MyCustomAgent
```
### Tracking Custom Agent Functions
When building custom agents, you can use Opik's `@track` decorator on methods within your agent implementation. These decorated functions will automatically be captured as spans within the trial trace, giving you detailed visibility into your agent's internal logic:
```python
from harbor.agents.base import BaseAgent
from opik import track
class MyCustomAgent(BaseAgent):
@staticmethod
def name() -> str:
return "my-custom-agent"
@track
async def plan_next_action(self, observation: str) -> str:
# This function will appear as a span in Opik
# Add your planning logic here
return action
@track
async def execute_tool(self, tool_name: str, args: dict) -> str:
# This will also be tracked as a nested span
result = await self._run_tool(tool_name, args)
return result
async def run(self, instruction: str, environment, context) -> None:
# Your main agent loop
while not done:
observation = await environment.exec("pwd")
action = await self.plan_next_action(observation)
result = await self.execute_tool(action.tool, action.args)
```
This allows you to trace not just the ATIF trajectory steps, but also the internal decision-making processes of your custom agent.
## What Gets Logged
Each trial completion creates an Opik trace with:
* Trial name and task information as the trace name and input
* Agent execution timing as start/end times
* Verifier rewards (e.g., pass/fail, tests passed) as feedback scores
* Agent and model metadata
* Exception information if the trial failed
### Trajectory Spans
The integration automatically creates spans for each step in the agent's trajectory, giving you detailed visibility into the agent-environment interaction. Each trajectory step becomes a span showing:
* The step source (user, agent, or system)
* The message content
* Tool calls and their arguments
* Observation results from the environment
* Token usage and cost per step
* Model name for agent steps
### Verifier Rewards as Feedback Scores
Harbor's verifier produces rewards like `{"pass": 1, "tests_passed": 5}`. These are automatically converted to Opik feedback scores, allowing you to:
* Filter traces by pass/fail status
* Aggregate metrics across experiments
* Compare agent performance across benchmarks
## Cost Tracking
The Harbor integration automatically extracts token usage and cost from ATIF trajectory metrics. If your agent records `prompt_tokens`, `completion_tokens`, and `cost_usd` in step metrics, these are captured in Opik spans.
## Environment Variables
| Variable | Description |
| ------------------- | ------------------------------- |
| `OPIK_PROJECT_NAME` | Default project name for traces |
| `OPIK_API_KEY` | API key for Opik Cloud |
| `OPIK_WORKSPACE` | Workspace name (for Opik Cloud) |
### Getting Help
* Check the [Harbor documentation](https://github.com/laude-institute/harbor) for agent and benchmark setup
* Review the [ATIF specification](https://www.harborframework.com/docs/agents/trajectory-format) for trajectory format details
* Open an issue on [GitHub](https://github.com/comet-ml/opik/issues) for Opik integration questions
***
description: Start here to integrate Opik into your Instructor-based genai application for structured output tracking, schema validation monitoring, and LLM call observability.
headline: Instructor | Opik Documentation
og:description: Learn to integrate Opik with Instructor to log all calls as traces, enhancing your structured output management in LLMs.
og:site_name: Opik Documentation
og:title: Integrate Instructor with Opik for Enhanced Tracing
title: Structured Output Tracking for Instructor with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
[Instructor](https://github.com/instructor-ai/instructor) is a Python library for working with structured outputs
for LLMs built on top of Pydantic. It provides a simple way to manage schema validations, retries and streaming responses.
In this guide, we will showcase how to integrate Opik with Instructor so that all the Instructor calls are logged as traces in Opik.
## Account Setup
[Comet](https://www.comet.com/site?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=instructor\&utm_campaign=opik) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=instructor\&utm_campaign=opik) and grab your API Key.
> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=instructor\&utm_campaign=opik) for more information.
## Getting Started
### Installation
First, ensure you have both `opik` and `instructor` installed:
```bash
pip install opik instructor
```
### Configuring Opik
Configure the Opik Python SDK for your deployment type. See the [Python SDK Configuration guide](/tracing/advanced/sdk_configuration) for detailed instructions on:
* **CLI configuration**: `opik configure`
* **Code configuration**: `opik.configure()`
* **Self-hosted vs Cloud vs Enterprise** setup
* **Configuration files** and environment variables
### Configuring Instructor
In order to use Instructor, you will need to configure your LLM provider API keys. For this example, we'll use OpenAI, Anthropic, and Gemini. You can [find or create your API keys in these pages](https://platform.openai.com/settings/organization/api-keys):
You can set them as environment variables:
```bash
export OPENAI_API_KEY="YOUR_API_KEY"
export ANTHROPIC_API_KEY="YOUR_API_KEY"
export GOOGLE_API_KEY="YOUR_API_KEY"
```
Or set them programmatically:
```python
import os
import getpass
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
if "ANTHROPIC_API_KEY" not in os.environ:
os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Enter your Anthropic API key: ")
if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google API key: ")
```
## Using Opik with Instructor library
In order to log traces from Instructor into Opik, we are going to patch the `instructor` library. This will log each LLM call to the Opik platform.
For all the integrations, we will first add tracking to the LLM client and then pass it to the Instructor library:
```python
from opik.integrations.openai import track_openai
import instructor
from pydantic import BaseModel
from openai import OpenAI
# We will first create the OpenAI client and add the `track_openai`
# method to log data to Opik
openai_client = track_openai(OpenAI())
# Patch the OpenAI client for Instructor
client = instructor.from_openai(openai_client)
# Define your desired output structure
class UserInfo(BaseModel):
name: str
age: int
user_info = client.chat.completions.create(
model="gpt-4o-mini",
response_model=UserInfo,
messages=[{"role": "user", "content": "John Doe is 30 years old."}],
)
print(user_info)
```
Thanks to the `track_openai` method, all the calls made to OpenAI will be logged to the Opik platform. This approach also works well if you are also using the `opik.track` decorator as it will automatically log the LLM call made with Instructor to the relevant trace.
## Integrating with other LLM providers
The instructor library supports many LLM providers beyond OpenAI, including: Anthropic, AWS Bedrock, Gemini, etc. Opik supports the majority of these providers as well.
Here are the code snippets needed for the integration with different providers:
### Anthropic
```python
from opik.integrations.anthropic import track_anthropic
import instructor
from anthropic import Anthropic
# Add Opik tracking
anthropic_client = track_anthropic(Anthropic())
# Patch the Anthropic client for Instructor
client = instructor.from_anthropic(
anthropic_client, mode=instructor.Mode.ANTHROPIC_JSON
)
user_info = client.chat.completions.create(
model="claude-3-5-sonnet-20241022",
response_model=UserInfo,
messages=[{"role": "user", "content": "John Doe is 30 years old."}],
max_tokens=1000,
)
print(user_info)
```
### Gemini
```python
from opik.integrations.genai import track_genai
import instructor
from google import genai
# Add Opik tracking
gemini_client = track_genai(genai.Client())
# Patch the GenAI client for Instructor
client = instructor.from_genai(
gemini_client, mode=instructor.Mode.GENAI_STRUCTURED_OUTPUTS
)
user_info = client.chat.completions.create(
model="gemini-2.0-flash-001",
response_model=UserInfo,
messages=[{"role": "user", "content": "John Doe is 30 years old."}],
)
print(user_info)
```
You can read more about how to use the Instructor library in [their documentation](https://python.useinstructor.com/).
***
description: Start here to integrate Opik into your LangChain-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: LangChain | Opik Documentation
og:description: Capture detailed insights and track costs in your LangChain applications seamlessly using Opik's integration with automatic logging features.
og:site_name: Opik Documentation
og:title: Unlock LangChain's Potential with Opik
title: Observability for LangChain (Python) with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
## Cost Tracking
The `OpikTracer` automatically tracks token usage and cost for all supported LLM models used within LangChain applications.
Cost information is automatically captured and displayed in the Opik UI, including:
* Token usage details
* Cost per request based on model pricing
* Total trace cost
## Practical Example: Classification Workflow
Let's walk through a real-world example of using LangGraph with Opik for a classification workflow. This example demonstrates how to create a graph with conditional routing and track its execution.
### Setting up the Environment
First, let's set up our environment with the necessary dependencies:
```python
import opik
# Configure Opik
opik.configure(use_local=False)
```
### Creating the LangGraph Workflow
We'll create a LangGraph workflow with 3 nodes that demonstrates conditional routing:
```python
from langgraph.graph import StateGraph, END
from typing import TypedDict, Optional
# Define the graph state
class GraphState(TypedDict):
question: Optional[str] = None
classification: Optional[str] = None
response: Optional[str] = None
# Create the node functions
def classify(question: str) -> str:
return "greeting" if question.startswith("Hello") else "search"
def classify_input_node(state):
question = state.get("question", "").strip()
classification = classify(question)
return {"classification": classification}
def handle_greeting_node(state):
return {"response": "Hello! How can I help you today?"}
def handle_search_node(state):
question = state.get("question", "").strip()
search_result = f"Search result for '{question}'"
return {"response": search_result}
# Create the workflow
workflow = StateGraph(GraphState)
workflow.add_node("classify_input", classify_input_node)
workflow.add_node("handle_greeting", handle_greeting_node)
workflow.add_node("handle_search", handle_search_node)
# Add conditional routing
def decide_next_node(state):
return (
"handle_greeting"
if state.get("classification") == "greeting"
else "handle_search"
)
workflow.add_conditional_edges(
"classify_input",
decide_next_node,
{"handle_greeting": "handle_greeting", "handle_search": "handle_search"},
)
workflow.set_entry_point("classify_input")
workflow.add_edge("handle_greeting", END)
workflow.add_edge("handle_search", END)
app = workflow.compile()
```
### Executing with Opik Tracing
Now let's execute the workflow with Opik tracing enabled using `track_langgraph`:
```python
from opik.integrations.langchain import OpikTracer, track_langgraph
# Create OpikTracer and track the graph once
# The graph visualization is automatically extracted by track_langgraph
opik_tracer = OpikTracer(
project_name="classification-workflow"
)
app = track_langgraph(app, opik_tracer)
# Execute the workflow - no callbacks needed!
inputs = {"question": "Hello, how are you?"}
result = app.invoke(inputs)
print(result)
# Test with a different input - still tracked automatically
inputs = {"question": "What is machine learning?"}
result = app.invoke(inputs)
print(result)
```
The graph execution is now logged on the Opik platform and can be viewed in the UI. The trace will show the complete execution path through the graph, including the classification decision and the chosen response path.
## Compatibility with Opik tracing context
LangGraph tracing integrates seamlessly with Opik's tracing context, allowing you to call `@track`-decorated functions (and most use most of other native Opik integrations) from within your graph nodes and have them automatically attached to the trace tree.
### Synchronous execution (invoke)
For synchronous graph execution using `invoke()`, everything works out of the box. You can access current spans/traces from LangGraph nodes and call tracked functions inside them:
```python
import opik_context
from opik import track
from opik.integrations.langchain import OpikTracer, track_langgraph
from langgraph.graph import StateGraph, START, END
@track
def process_data(value: int) -> int:
"""Custom tracked function that will be attached to the trace tree."""
return value * 2
def my_node(state):
current_trace_data = opik_context.get_current_trace_data()
current_span_data = opik_context.get_current_span_data() # will return the span for `my_node`, created by OpikTracer
# This tracked function call will automatically be part of the trace tree
result = process_data(state["value"])
return {"value": result}
# Build and execute graph
graph = StateGraph(dict)
graph.add_node("processor", my_node)
graph.add_edge(START, "processor")
graph.add_edge("processor", END)
app = graph.compile()
opik_tracer = OpikTracer()
app = track_langgraph(app, opik_tracer)
# Synchronous execution - tracked functions work automatically
result = app.invoke({"value": 21})
```
### Asynchronous execution (ainvoke)
For asynchronous graph execution using `ainvoke()`, you need to explicitly propagate the trace context to `@track`-decorated functions using the `extract_current_langgraph_span_data` helper:
## What gets traced
With this setup, your LiveKit agent will automatically trace:
* **Session events**: Session start and end with metadata
* **Agent turns**: Complete conversation turns with timing
* **LLM operations**: Model calls, prompts, responses, and token usage
* **Function tools**: Tool executions with inputs and outputs
* **TTS operations**: Text-to-speech conversions with audio metadata
* **STT operations**: Speech-to-text transcriptions
* **End-of-turn detection**: Conversation flow events
## Further improvements
If you have any questions or suggestions for improving the LiveKit Agents integration, please [open an issue](https://github.com/comet-ml/opik/issues/new/choose) on our GitHub repository.
***
description: Start here to integrate Opik into your LlamaIndex-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: Llama Index | Opik Documentation
og:description: Build powerful LLM applications using LlamaIndex to easily connect, structure, and query your diverse data sources.
og:site_name: Opik Documentation
og:title: Llama Index: Build LLM Apps with Opik Frameworks
title: Observability for LlamaIndex with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
[LlamaIndex](https://github.com/run-llama/llama_index) is a flexible data framework for building LLM applications:
LlamaIndex is a "data framework" to help you build LLM apps. It provides the following tools:
* Offers data connectors to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.).
* Provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs.
* Provides an advanced retrieval/query interface over your data: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.
* Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else).
## Account Setup
[Comet](https://www.comet.com/site?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=llamaindex\&utm_campaign=opik) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=llamaindex\&utm_campaign=opik) and grab your API Key.
> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=llamaindex\&utm_campaign=opik) for more information.
## Getting Started
### Installation
To use the Opik integration with LlamaIndex, you'll need to have both the `opik` and `llama_index` packages installed. You can install them using pip:
```bash
pip install opik llama-index llama-index-agent-openai llama-index-llms-openai llama-index-callbacks-opik
```
### Configuring Opik
Configure the Opik Python SDK for your deployment type. See the [Python SDK Configuration guide](/tracing/advanced/sdk_configuration) for detailed instructions on:
* **CLI configuration**: `opik configure`
* **Code configuration**: `opik.configure()`
* **Self-hosted vs Cloud vs Enterprise** setup
* **Configuration files** and environment variables
### Configuring LlamaIndex
In order to use LlamaIndex, you will need to configure your LLM provider API keys. For this example, we'll use OpenAI. You can [find or create your API keys in these pages](https://platform.openai.com/settings/organization/api-keys):
You can set them as environment variables:
```bash
export OPENAI_API_KEY="YOUR_API_KEY"
```
Or set them programmatically:
```python
import os
import getpass
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
```
## Using the Opik integration
To use the Opik integration with LLamaIndex, you can use the `set_global_handler` function from the LlamaIndex package to set the global tracer:
```python
from llama_index.core import global_handler, set_global_handler
set_global_handler("opik")
opik_callback_handler = global_handler
```
Now that the integration is set up, all the LlamaIndex runs will be traced and logged to Opik.
Alternatively, you can configure the callback handler directly for more control:
```python
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from opik.integrations.llama_index import LlamaIndexCallbackHandler
# Basic setup
opik_callback = LlamaIndexCallbackHandler()
# Or with optional parameters
opik_callback = LlamaIndexCallbackHandler(
project_name="my-llamaindex-project", # Set custom project name
skip_index_construction_trace=True # Skip tracking index construction
)
Settings.callback_manager = CallbackManager([opik_callback])
```
The `skip_index_construction_trace` parameter is useful when you want to track only query operations and not the index construction phase (particularly for large document sets or pre-built indexes)
## Example
To showcase the integration, we will create a new a query engine that will use Paul Graham's essays as the data source.
**First step:**
Configure the Opik integration:
```python
import os
from llama_index.core import global_handler, set_global_handler
# Set project name for better organization
os.environ["OPIK_PROJECT_NAME"] = "llamaindex-integration-demo"
set_global_handler("opik")
opik_callback_handler = global_handler
```
**Second step:**
Download the example data:
```python
import os
import requests
# Create directory if it doesn't exist
os.makedirs('./data/paul_graham/', exist_ok=True)
# Download the file using requests
url = 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt'
response = requests.get(url)
with open('./data/paul_graham/paul_graham_essay.txt', 'wb') as f:
f.write(response.content)
```
**Third step:**
Configure the OpenAI API key:
```python
import os
import getpass
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
```
**Fourth step:**
We can now load the data, create an index and query engine:
```python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)
```
Given that the integration with Opik has been set up, all the traces are logged to the Opik platform:
## Using with the @track Decorator
The LlamaIndex integration seamlessly works with Opik's `@track` decorator. When you call LlamaIndex operations inside a tracked function, the LlamaIndex traces will automatically be attached as child spans to your existing trace.
```python
import opik
from llama_index.core import global_handler, set_global_handler
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
# Configure Opik integration
set_global_handler("opik")
opik_callback_handler = global_handler
@opik.track()
def my_llm_application(user_query: str):
"""Process user query with LlamaIndex"""
llm = OpenAI(model="gpt-3.5-turbo")
messages = [
ChatMessage(role="system", content="You are a helpful assistant."),
ChatMessage(role="user", content=user_query),
]
response = llm.chat(messages)
return response.message.content
# Call the tracked function
result = my_llm_application("What is the capital of France?")
print(result)
```
In this example, Opik will create a trace for the `my_llm_application` function, and all LlamaIndex operations (like the LLM chat call) will appear as nested spans within this trace, giving you a complete view of your application's execution.
## Using with Manual Trace Creation
You can also manually create traces using `opik.start_as_current_trace()` and have LlamaIndex operations nested within:
```python
import opik
from llama_index.core import global_handler, set_global_handler
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
# Configure Opik integration
set_global_handler("opik")
opik_callback_handler = global_handler
# Create a manual trace
with opik.start_as_current_trace(name="user_query_processing"):
llm = OpenAI(model="gpt-3.5-turbo")
messages = [
ChatMessage(role="user", content="Explain quantum computing in simple terms"),
]
response = llm.chat(messages)
print(response.message.content)
```
This approach is useful when you want more control over trace naming and want to group multiple LlamaIndex operations under a single trace.
## Tracking LlamaIndex Workflows
LlamaIndex workflows are multi-step processing pipelines for LLM applications. To track workflow executions in Opik, you can manually decorate your workflow steps and use `opik.start_as_current_span()` to wrap the workflow execution.
### Basic Workflow Tracking
You can use `@opik.track()` to decorate your workflow steps and `opik.start_as_current_span()` to track the workflow execution:
```python
import opik
from llama_index.core.workflow import Workflow, StartEvent, StopEvent, step, Event
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from llama_index.core import global_handler, set_global_handler
# Configure Opik integration for LLM calls within steps
set_global_handler("opik")
class QueryEvent(Event):
"""Event for passing query through workflow."""
query: str
class MyRAGWorkflow(Workflow):
"""Simple RAG workflow with tracked steps."""
@step
@opik.track()
async def retrieve_context(self, ev: StartEvent) -> QueryEvent:
"""Retrieve relevant context for the query."""
query = ev.get("query", "")
# Your retrieval logic here
context = f"Context for: {query}"
return QueryEvent(query=f"{context} | {query}")
@step
@opik.track()
async def generate_response(self, ev: QueryEvent) -> StopEvent:
"""Generate final response using the context."""
# Your generation logic here
result = f"Response based on: {ev.query}"
return StopEvent(result=result)
# Create workflow instance
workflow = MyRAGWorkflow()
# Use start_as_current_span to track workflow execution
with opik.start_as_current_span(
name="rag_workflow_execution",
input={"query": "What are the key features?"},
project_name="llama-index-workflows"
) as span:
result = await workflow.run(query="What are the key features?")
span.update(output={"result": result})
print(result)
opik.flush_tracker() # Ensure all traces are sent
```
In this example:
* Each workflow step is decorated with `@opik.track()` to create spans
* The `@step` decorator is placed before `@opik.track()` to ensure LlamaIndex can properly discover the workflow steps
* `opik.start_as_current_span()` tracks the overall workflow execution
* LLM calls within steps are automatically tracked via the global Opik handler
* All workflow steps appear as nested spans within the workflow trace
## Getting started
To use the Microsoft Agent Framework integration with Opik, you will need to have the Agent Framework and the required OpenTelemetry packages installed:
```bash
pip install --pre agent-framework opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
```
In addition, you will need to set the following environment variables to configure OpenTelemetry to send data to Opik:
You can create agents with custom function tools. The `OpikTracingProcessor` automatically captures all tool calls as well:
```python
from agents import Agent, Runner, function_tool, set_trace_processors
from opik.integrations.openai.agents import OpikTracingProcessor
set_trace_processors(processors=[OpikTracingProcessor()])
@function_tool
def calculate_average(numbers: list[float]) -> float:
return sum(numbers) / len(numbers)
@function_tool
def get_recommendation(topic: str, user_level: str) -> str:
recommendations = {
"python": {
"beginner": "Start with Python.org's tutorial, then try Python Crash Course book. Practice with simple scripts and built-in functions.",
"intermediate": "Explore frameworks like Flask/Django, learn about decorators, context managers, and dive into Python's data structures.",
"advanced": "Study Python internals, contribute to open source, learn about metaclasses, and explore performance optimization."
},
"machine learning": {
"beginner": "Start with Andrew Ng's Coursera course, learn basic statistics, and try scikit-learn with simple datasets.",
"intermediate": "Dive into deep learning with TensorFlow/PyTorch, study different algorithms, and work on real projects.",
"advanced": "Research latest papers, implement algorithms from scratch, and contribute to ML frameworks."
}
}
topic_lower = topic.lower()
level_lower = user_level.lower()
if topic_lower in recommendations and level_lower in recommendations[topic_lower]:
return recommendations[topic_lower][level_lower]
else:
return f"For {topic} at {user_level} level: Focus on fundamentals, practice regularly, and build projects to apply your knowledge."
def create_advanced_agent():
"""Create an advanced agent with tools and comprehensive instructions."""
instructions = """
You are an expert programming tutor and learning advisor. You have access to tools that help you:
1. Calculate averages for performance metrics, grades, or other numerical data
2. Provide personalized learning recommendations based on topics and user experience levels
Your role:
- Help users learn programming concepts effectively
- Provide clear, beginner-friendly explanations when needed
- Use your tools when appropriate to give concrete help
- Offer structured learning paths and resources
- Be encouraging and supportive
When users ask about:
- Programming languages: Use get_recommendation to provide tailored advice
- Performance or scores: Use calculate_average if numbers are involved
- Learning paths: Combine your knowledge with tool-based recommendations
Always explain your reasoning and make your responses educational.
"""
return Agent(
name="AdvancedProgrammingTutor",
instructions=instructions,
model="gpt-4o-mini",
tools=[calculate_average, get_recommendation]
)
# Create and use the advanced agent
advanced_agent = create_advanced_agent()
# Example queries
queries = [
"I'm new to Python programming. Can you tell me about it?",
"I got these test scores: 85, 92, 78, 96, 88. What's my average and how am I doing?",
"I know some Python basics but want to learn machine learning. What should I do next?",
]
for i, query in enumerate(queries, 1):
print(f"\n📝 Query {i}: {query}")
result = Runner.run_sync(advanced_agent, query)
print(f"🤖 Response: {result.final_output}")
print("=" * 80)
```
### Adding granularity with the `@track` decorator
If you need more visibility into what happens inside your tool functions, you can use the `@track` decorator to trace specific steps within the tool execution:
```python
from agents import Agent, Runner, function_tool, set_trace_processors
from opik.integrations.openai.agents import OpikTracingProcessor
from opik import track
set_trace_processors(processors=[OpikTracingProcessor()])
@track(name="fetch_user_data")
def fetch_user_data(user_id: str) -> dict:
# This step will be traced separately
return {"user_id": user_id, "preferences": ["python", "ml"]}
@track(name="generate_recommendations")
def generate_recommendations(preferences: list) -> str:
# This step will also be traced separately
return f"Based on your interests in {', '.join(preferences)}, we recommend..."
@function_tool
def get_personalized_advice(user_id: str) -> str:
"""Get personalized learning advice for a user."""
# Each tracked function call inside the tool will appear as a separate span
user_data = fetch_user_data(user_id)
recommendations = generate_recommendations(user_data["preferences"])
return recommendations
agent = Agent(
name="PersonalizedTutor",
instructions="Help users with personalized learning advice.",
model="gpt-4o-mini",
tools=[get_personalized_advice]
)
result = Runner.run_sync(agent, "Give me learning advice for user_123")
print(result.final_output)
```
## Logging threads
When you are running multi-turn conversations with OpenAI Agents using [OpenAI Agents trace API](https://openai.github.io/openai-agents-python/running_agents/#conversationschat-threads), Opik integration automatically use the trace group\_id as the Thread ID so you can easily review conversation inside Opik. Here is an example below:
```python
async def main():
agent = Agent(name="Assistant", instructions="Reply very concisely.")
thread_id = str(uuid.uuid4())
with trace(workflow_name="Conversation", group_id=thread_id):
# First turn
result = await Runner.run(agent, "What city is the Golden Gate Bridge in?")
print(result.final_output)
# San Francisco
# Second turn
new_input = result.to_input_list() + [{"role": "user", "content": "What state is it in?"}]
result = await Runner.run(agent, new_input)
print(result.final_output)
# California
```
## Further improvements
OpenAI Agents is still a relatively new framework and we are working on a couple of improvements:
1. Improved rendering of the inputs and outputs for the LLM calls as part of our `Pretty Mode` functionality
2. Improving the naming conventions for spans
3. Adding the agent execution input and output at a trace level
If there are any additional improvements you would like us to make, feel free to open an issue on our [GitHub repository](https://github.com/comet-ml/opik/issues).
***
description: Start here to integrate Opik into your Pipecat-based real-time voice agent application for end-to-end LLM observability, unit testing, and optimization.
headline: Pipecat | Opik Documentation
og:description: Learn to integrate Opik with Pipecat for real-time monitoring and tracing of voice agents, enhancing observability in AI systems.
og:site_name: Opik Documentation
og:title: Integrate Opik with Pipecat for Enhanced AI
title: Observability for Pipecat with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
[Pipecat](https://github.com/pipecat-ai/pipecat) is an open-source Python framework for building real-time voice and multimodal conversational AI agents. Developed by Daily, it enables fully programmable AI voice agents and supports multimodal interactions, positioning itself as a flexible solution for developers looking to build conversational AI systems.
This guide explains how to integrate Opik with Pipecat for observability and tracing of real-time voice agents, enabling you to monitor, debug, and optimize your Pipecat agents in the Opik dashboard.
## Account Setup
[Comet](https://www.comet.com/site?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=pipecat\&utm_campaign=opik) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=pipecat\&utm_campaign=opik) and grab your API Key.
> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=pipecat\&utm_campaign=opik) for more information.
## Getting started
To use the Pipecat integration with Opik, you will need to have Pipecat and the required OpenTelemetry packages installed:
```bash
pip install pipecat-ai[daily,webrtc,silero,cartesia,deepgram,openai,tracing] opentelemetry-exporter-otlp-proto-http websockets
```
## Logging threads
You can group multiple agent calls into a conversation thread by setting `thread_id` as a span attribute on the root Logfire span. Opik's OTEL ingestion recognizes this attribute and maps it directly to the trace's `thread_id` field:
```python
# Logfire wraps OTEL - thread_id becomes a span attribute automatically
with logfire.span("chat_turn", thread_id=thread_id):
result = agent.run_sync("What is machine learning?")
```
## Further improvements
If you would like to see us improve this integration, simply open a new feature
request on [Github](https://github.com/comet-ml/opik/issues).
***
description: Start here to integrate Opik into your Semantic Kernel-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: Semantic Kernel | Opik Documentation
og:description: Leverage Semantic Kernel to integrate LLMs with languages like C# and Python, enabling rapid development of enterprise-grade AI solutions.
og:site_name: Opik Documentation
og:title: Build AI Applications with Semantic Kernel - Opik
title: Observability for Semantic Kernel (Python) with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
[Semantic Kernel](https://github.com/microsoft/semantic-kernel) is a powerful open-source SDK from Microsoft. It facilitates the combination of LLMs with popular programming languages like C#, Python, and Java. Semantic Kernel empowers developers to build sophisticated AI applications by seamlessly integrating AI services, data sources, and custom logic, accelerating the delivery of enterprise-grade AI solutions.
Learn more about Semantic Kernel in the [official documentation](https://learn.microsoft.com/en-us/semantic-kernel/overview/).

## Getting started
To use the Semantic Kernel integration with Opik, you will need to have Semantic Kernel and the required OpenTelemetry packages installed:
```bash
pip install semantic-kernel opentelemetry-exporter-otlp-proto-http
```
## Environment configuration
Configure your environment variables based on your Opik deployment:
## Further improvements
If you would like to see us improve this integration, simply open a new feature
request on [Github](https://github.com/comet-ml/opik/issues).
***
description: Start here to integrate Opik into your Strands Agents-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: Strands Agents | Opik Documentation
og:description: Explore how to build scalable AI agents with Strands Agents SDK, enabling seamless development from simple chats to complex workflows.
og:site_name: Opik Documentation
og:title: Build AI Agents with Strands - Opik
title: Observability for Strands Agents with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
[Strands Agents](https://github.com/strands-agents/sdk-python) is a simple yet powerful SDK that takes a model-driven approach to building and running AI agents.
The framework's primary advantage is its ability to scale from simple conversational assistants to complex autonomous workflows, supporting both local development and production deployment with built-in observability.
After running your Strands Agents workflow with the OpenTelemetry configuration, you'll see detailed traces in the Opik UI showing agent interactions, model calls, and conversation flows as demonstrated in the screenshot above.
## Getting started
To use the Strands Agents integration with Opik, you will need to have Strands Agents and the required OpenTelemetry packages installed:
```bash
pip install --upgrade "strands-agents" "strands-agents-tools" opentelemetry-sdk opentelemetry-exporter-otlp
```
In addition, you will need to set the following environment variables to
configure the OpenTelemetry integration:
## Advanced Usage
### Using with the `@track` decorator
If you have multiple steps in your LLM pipeline, you can use the `@track` decorator to log the traces for each step. If Anthropic is called within one of these steps, the LLM call will be associated with that corresponding step:
```python
import anthropic
from opik import track
from opik.integrations.anthropic import track_anthropic
os.environ["OPIK_PROJECT_NAME"] = "anthropic-integration-demo"
anthropic_client = anthropic.Anthropic()
anthropic_client = track_anthropic(anthropic_client)
@track
def generate_story(prompt):
res = anthropic_client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}],
)
return res.content[0].text
@track
def generate_topic():
prompt = "Generate a topic for a story about Opik."
res = anthropic_client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}],
)
return res.content[0].text
@track
def generate_opik_story():
topic = generate_topic()
story = generate_story(topic)
return story
# Execute the multi-step pipeline
generate_opik_story()
```
The trace can now be viewed in the UI with hierarchical spans showing the relationship between different steps:
## Cost Tracking
The `track_anthropic` wrapper automatically tracks token usage and cost for all supported Anthropic models.
Cost information is automatically captured and displayed in the Opik UI, including:
* Token usage details
* Cost per request based on Anthropic pricing
* Total trace cost
### Invoke Model API (Model-Specific Formats)
The Invoke Model API uses model-specific request and response formats. Here are examples for different providers:
### Invoke Model Stream API
The `invoke_model_with_response_stream` method supports streaming with model-specific formats:
## Cost Tracking
The `track_bedrock` wrapper automatically tracks token usage and cost for all supported AWS Bedrock models, regardless of whether you use the Converse API or the Invoke Model API.
## Using Cohere within a tracked function
If you are using Cohere within a function tracked with the [`@track`](/tracing/advanced/log_traces#using-function-decorators) decorator, you can use the tracked client as normal:
```python
from opik import track
from opik.integrations.openai import track_openai
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ.get("COHERE_API_KEY"),
base_url="https://api.cohere.ai/compatibility/v1"
)
tracked_client = track_openai(client)
@track
def generate_story(prompt):
response = tracked_client.chat.completions.create(
model="command-r7b-12-2024",
messages=[
{"role": "user", "content": prompt}
]
)
return response.choices[0].message.content
@track
def generate_topic():
prompt = "Generate a topic for a story about Opik."
response = tracked_client.chat.completions.create(
model="command-r7b-12-2024",
messages=[
{"role": "user", "content": prompt}
]
)
return response.choices[0].message.content
@track
def generate_opik_story():
topic = generate_topic()
story = generate_story(topic)
return story
generate_opik_story()
```
## Supported Cohere models
The `track_openai` wrapper with Cohere's compatibility API supports the following Cohere models:
* `command-r7b-12-2024` - Command R 7B model
* `command-r-plus` - Command R Plus model
* `command-r` - Command R model
* `command-light` - Command Light model
* `command` - Command model
## Supported OpenAI methods
The `track_openai` wrapper supports the following OpenAI methods when used with Cohere:
* `client.chat.completions.create()`, including support for stream=True mode
* `client.beta.chat.completions.parse()`
* `client.beta.chat.completions.stream()`
* `client.responses.create()`
If you would like to track another OpenAI method, please let us know by opening an issue on [GitHub](https://github.com/comet-ml/opik/issues).
***
description: Start here to integrate Opik into your DeepSeek-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: DeepSeek | Opik Documentation
og:description: Learn how to track DeepSeek calls using Opik with multiple hosting options like DeepSeek API, Fireworks AI API, and Together AI API.
og:site_name: Opik Documentation
og:title: Integrate DeepSeek with Opik for Enhanced Tracking
title: Observability for DeepSeek with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
Deepseek is an Open-Source LLM model that rivals o1 from OpenAI. You can learn more about DeepSeek on [Github](https://github.com/deepseek-ai/DeepSeek-R1) or
on [deepseek.com](https://www.deepseek.com/).
In this guide, we will showcase how to track DeepSeek calls using Opik. As DeepSeek is open-source, there are many way to run and call the model. We will focus on how to integrate Opik with the following hosting options:
1. DeepSeek API
2. Fireworks AI API
3. Together AI API
## Getting started
### Configuring your hosting provider
Before you can start tracking DeepSeek calls, you need to get the API key from your hosting provider.
## Using with VertexAI
To use Opik with VertexAI, configure the `google-genai` client for VertexAI and wrap it with `track_genai`:
```python
from google import genai
from opik.integrations.genai import track_genai
# Configure for VertexAI
PROJECT_ID = "your-project-id"
LOCATION = "us-central1"
client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
vertexai_client = track_genai(client)
# Set project name for organization
os.environ["OPIK_PROJECT_NAME"] = "vertexai-integration-demo"
# Use the wrapped client
response = vertexai_client.models.generate_content(
model="gemini-2.0-flash-001",
contents="Write a short story about AI observability."
)
print(response.text)
```
## Advanced Usage
### Using with the `@track` decorator
If you have multiple steps in your LLM pipeline, you can use the `@track` decorator to log the traces for each step. If Gemini is called within one of these steps, the LLM call will be associated with that corresponding step:
```python
from opik import track
@track
def generate_story(prompt):
response = gemini_client.models.generate_content(
model="gemini-2.0-flash-001", contents=prompt
)
return response.text
@track
def generate_topic():
prompt = "Generate a topic for a story about Opik."
response = gemini_client.models.generate_content(
model="gemini-2.0-flash-001", contents=prompt
)
return response.text
@track
def generate_opik_story():
topic = generate_topic()
story = generate_story(topic)
return story
# Execute the multi-step pipeline
generate_opik_story()
```
The trace can now be viewed in the UI with hierarchical spans showing the relationship between different steps:
## Multimodal Content Attachments
The `track_genai` wrapper automatically logs multimodal content parts (images, audio, video) as attachments in your traces. When you send images or other media to Gemini models, they are captured and viewable directly in the Opik UI alongside your trace data.
This makes it easy to:
* Review the exact media content sent to the model
* Debug multimodal prompts
* Audit model inputs for compliance
## Video Generation (Veo)
The `track_genai` wrapper also supports Google's Veo video generation API. When you generate videos, Opik automatically tracks the video creation process and logs the generated video as an attachment when you save it.
```python
import os
import time
import opik
from opik import track, opik_context
from opik.integrations.genai import track_genai
import google.genai as genai
from google.genai.types import HttpOptions, GenerateVideosConfig
os.environ["OPIK_PROJECT_NAME"] = "genai-video-demo"
# Configure for VertexAI (required for Veo)
client = genai.Client(
vertexai=True,
http_options=HttpOptions(api_version="v1"),
)
genai_client = track_genai(client)
@track
def generate_video(
prompt: str,
number_of_videos: int = 1,
duration_seconds: int = 4,
resolution: str = "720p",
generate_audio: bool = False,
) -> dict:
"""Generate a video using Google's Veo model."""
# Create video
operation = genai_client.models.generate_videos(
model="veo-3.1-fast-generate-preview",
prompt=prompt,
config=GenerateVideosConfig(
duration_seconds=duration_seconds,
resolution=resolution,
generate_audio=generate_audio,
number_of_videos=number_of_videos,
),
)
# Wait for completion
with opik.start_as_current_span(name="wait_for_completion") as span:
while not operation.done:
time.sleep(10)
operation = genai_client.operations.get(operation)
result = {"name": operation.name, "done": operation.done}
opik_context.update_current_span(output=result)
# Download all videos if generation succeeded
if operation.response and operation.response.generated_videos:
output_paths = []
for i, generated_video in enumerate(operation.response.generated_videos):
output_path = f"output_video_{i}.mp4"
generated_video.video.save(output_path)
output_paths.append(output_path)
result["output_paths"] = output_paths
return result
# Generate videos
generate_video("A golden retriever playing in the snow", number_of_videos=2)
```
The trace will show the full video generation workflow including the video creation, polling, and the generated video as an attachment:
## Cost Tracking
The `track_genai` wrapper automatically tracks token usage and cost for all supported Google AI models.
Cost information is automatically captured and displayed in the Opik UI, including:
* Token usage details
* Cost per request based on Google AI pricing
* Total trace cost
## Advanced Usage
### Using with the `@track` decorator
If you are using LiteLLM within a function tracked with the [`@track`](/tracing/advanced/log_traces#using-function-decorators) decorator, you will need to pass the `current_span_data` as metadata to the `litellm.completion` call:
```python
from opik import track
from opik.opik_context import get_current_span_data
import litellm
@track
def generate_story(prompt):
response = litellm.completion(
model="groq/llama3-8b-8192",
messages=[{"role": "user", "content": prompt}],
metadata={
"opik": {
"current_span_data": get_current_span_data(),
},
},
)
return response.choices[0].message.content
@track
def generate_topic():
prompt = "Generate a topic for a story about Opik."
response = litellm.completion(
model="groq/llama3-8b-8192",
messages=[{"role": "user", "content": prompt}],
metadata={
"opik": {
"current_span_data": get_current_span_data(),
},
},
)
return response.choices[0].message.content
@track
def generate_opik_story():
topic = generate_topic()
story = generate_story(topic)
return story
# Execute the multi-step pipeline
generate_opik_story()
```
***
description: Start here to integrate Opik into your Mistral AI-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: Mistral AI | Opik Documentation
og:description: Learn to integrate Mistral AI with Opik via LiteLLM to track and evaluate API calls effectively in your projects.
og:site_name: Opik Documentation
og:title: Integrate Mistral AI with Opik - Streamline Your Workflow
title: Observability for Mistral AI with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
[Mistral AI](https://mistral.ai/) provides cutting-edge large language models with excellent performance for text generation, reasoning, and specialized tasks like code generation.
This guide explains how to integrate Opik with Mistral AI via LiteLLM. By using the LiteLLM integration provided by Opik, you can easily track and evaluate your Mistral API calls within your Opik projects as Opik will automatically log the input prompt, model used, token usage, and response generated.
## Getting Started
### Configuring Opik
To start tracking your Mistral AI LLM calls, you'll need to have both `opik` and `litellm` installed. You can install them using pip:
```bash
pip install opik litellm
```
In addition, you can configure Opik using the `opik configure` command which will prompt you for the correct local server address or if you are using the Cloud platform your API key:
```bash
opik configure
```
### Configuring Mistral AI
You'll need to set your Mistral AI API key as an environment variable:
```bash
export MISTRAL_API_KEY="YOUR_API_KEY"
```
## Logging LLM calls
In order to log the LLM calls to Opik, you will need to create the OpikLogger callback. Once the OpikLogger callback is created and added to LiteLLM, you can make calls to LiteLLM as you normally would:
```python
from litellm.integrations.opik.opik import OpikLogger
import litellm
opik_logger = OpikLogger()
litellm.callbacks = [opik_logger]
response = litellm.completion(
model="mistral/mistral-large-2407",
messages=[
{"role": "user", "content": "Why is tracking and evaluation of LLMs important?"}
]
)
```
## Logging LLM calls within a tracked function
If you are using LiteLLM within a function tracked with the [`@track`](/tracing/advanced/log_traces#using-function-decorators) decorator, you will need to pass the `current_span_data` as metadata to the `litellm.completion` call:
```python
from opik import track, opik_context
import litellm
@track
def generate_story(prompt):
response = litellm.completion(
model="mistral/mistral-large-2407",
messages=[{"role": "user", "content": prompt}],
metadata={
"opik": {
"current_span_data": opik_context.get_current_span_data(),
},
},
)
return response.choices[0].message.content
@track
def generate_topic():
prompt = "Generate a topic for a story about Opik."
response = litellm.completion(
model="mistral/mistral-medium-2312",
messages=[{"role": "user", "content": prompt}],
metadata={
"opik": {
"current_span_data": opik_context.get_current_span_data(),
},
},
)
return response.choices[0].message.content
@track
def generate_opik_story():
topic = generate_topic()
story = generate_story(topic)
return story
generate_opik_story()
```
***
description: Start here to integrate Opik into your Novita AI-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: Novita AI | Opik Documentation
og:description: Learn to integrate Opik with Novita AI using LiteLLM to track and evaluate your API calls effectively within your projects.
og:site_name: Opik Documentation
og:title: Integrate Novita AI Models with Opik - Opik
title: Observability for Novita AI with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
***
description: Start here to integrate Opik into your OpenAI-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: OpenAI | Opik Documentation
og:description: Learn how to seamlessly track and evaluate OpenAI API calls in your Opik projects using the track_openai method.
og:site_name: Opik Documentation
og:title: Integrate OpenAI with Opik for Effective Tracking
title: Observability for OpenAI (Python) with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
## Advanced Usage
### Using with the `@track` decorator
If you have multiple steps in your LLM pipeline, you can use the `@track` decorator to log the traces for each step. If OpenAI is called within one of these steps, the LLM call will be associated with that corresponding step:
```python
from opik import track
from opik.integrations.openai import track_openai
from openai import OpenAI
os.environ["OPIK_PROJECT_NAME"] = "openai-integration-demo"
client = OpenAI()
openai_client = track_openai(client)
@track
def generate_story(prompt):
res = openai_client.chat.completions.create(
model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}]
)
return res.choices[0].message.content
@track
def generate_topic():
prompt = "Generate a topic for a story about Opik."
res = openai_client.chat.completions.create(
model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}]
)
return res.choices[0].message.content
@track
def generate_opik_story():
topic = generate_topic()
story = generate_story(topic)
return story
# Execute the multi-step pipeline
generate_opik_story()
```
The trace can now be viewed in the UI with hierarchical spans showing the relationship between different steps:
## Using Azure OpenAI
The OpenAI integration also supports Azure OpenAI Services. To use Azure OpenAI, initialize your client with Azure configuration and use it with `track_openai` just like the standard OpenAI client:
```python
from opik.integrations.openai import track_openai
from openai import AzureOpenAI
# gets the API Key from environment variable AZURE_OPENAI_API_KEY
azure_client = AzureOpenAI(
# https://learn.microsoft.com/azure/ai-services/openai/reference#rest-api-versioning
api_version="2023-07-01-preview",
# https://learn.microsoft.com/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal#create-a-resource
azure_endpoint="https://example-endpoint.openai.azure.com",
)
azure_client = track_openai(azure_client)
completion = azure_client.chat.completions.create(
model="deployment-name", # e.g. gpt-35-instant
messages=[
{
"role": "user",
"content": "How do I output all files in a directory using Python?",
},
],
)
```
## Cost Tracking
The `track_openai` wrapper automatically tracks token usage and cost for all supported OpenAI models.
Cost information is automatically captured and displayed in the Opik UI, including:
* Token usage details
* Cost per request based on OpenAI pricing
* Total trace cost
## Advanced Usage
### SequentialChain Example
Now, let's create a more complex chain and run it with Opik tracing:
```python
from langchain.chains import LLMChain, SimpleSequentialChain
from langchain_core.prompts import PromptTemplate
# Synopsis chain
template = """You are a playwright. Given the title of play, it is your job to write a synopsis for that title.
Title: {title}
Playwright: This is a synopsis for the above play:"""
prompt_template = PromptTemplate(input_variables=["title"], template=template)
synopsis_chain = LLMChain(llm=model, prompt=prompt_template)
# Review chain
template = """You are a play critic from the New York Times. Given the synopsis of play, it is your job to write a review for that play.
Play Synopsis:
{synopsis}
Review from a New York Times play critic of the above play:"""
prompt_template = PromptTemplate(input_variables=["synopsis"], template=template)
review_chain = LLMChain(llm=model, prompt=prompt_template)
# Overall chain
overall_chain = SimpleSequentialChain(
chains=[synopsis_chain, review_chain], verbose=True
)
# Run the chain with Opik tracing
review = overall_chain.run("Tragedy at sunset on the beach", callbacks=[opik_tracer])
print(review)
```
### Accessing Logged Traces
We can access the trace IDs collected by the Opik tracer:
```python
traces = opik_tracer.created_traces()
print("Collected trace IDs:", [trace.id for trace in traces])
# Flush traces to ensure all data is logged
opik_tracer.flush()
```
### Fine-tuned LLM Example
Finally, let's use a fine-tuned model with Opik tracing:
**Note:** In order to use a fine-tuned model, you will need to have access to the model and the correct model ID. The code below will return a `NotFoundError` unless the `model` and `adapter_id` are updated.
```python
fine_tuned_model = Predibase(
model="my-base-LLM",
predibase_api_key=os.environ.get("PREDIBASE_API_TOKEN"),
predibase_sdk_version=None,
adapter_id="my-finetuned-adapter-id",
adapter_version=1,
**{
"api_token": os.environ.get("HUGGING_FACE_HUB_TOKEN"),
"max_new_tokens": 5,
},
)
# Configure the Opik tracer
fine_tuned_model = fine_tuned_model.with_config({"callbacks": [opik_tracer]})
# Invoke the fine-tuned model
response = fine_tuned_model.invoke(
"Can you help categorize the following emails into positive, negative, and neutral?",
**{"temperature": 0.5, "max_new_tokens": 1024},
)
print(response)
# Final flush to ensure all traces are logged
opik_tracer.flush()
```
## Tracking your fine-tuning training runs
If you are using Predibase to fine-tune an LLM, we recommend using Predibase's integration with Comet's Experiment Management functionality. You can learn more about how to set this up in the [Comet integration guide](https://docs.predibase.com/integrations/comet) in the Predibase documentation. If you are already using an Experiment Tracking platform, worth checking if it has an integration with Predibase.
***
description: Start here to integrate Opik into your Together AI-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: Together AI | Opik Documentation
og:description: Configure Opik to track Together AI calls, enabling easy evaluation of models like Llama and Mistral in your projects.
og:site_name: Opik Documentation
og:title: Integrate Together AI with Opik - Fast Inference
title: Observability for Together AI with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
[Together AI](https://www.together.ai/) provides fast inference for leading open-source models including Llama, Mistral, Qwen, and many others.
This guide explains how to integrate Opik with Together AI via LiteLLM. By using the LiteLLM integration provided by Opik, you can easily track and evaluate your Together AI calls within your Opik projects as Opik will automatically log the input prompt, model used, token usage, and response generated.
## Getting Started
### Configuring Opik
To start tracking your Together AI calls, you'll need to have both `opik` and `litellm` installed. You can install them using pip:
```bash
pip install opik litellm
```
In addition, you can configure Opik using the `opik configure` command which will prompt you for the correct local server address or if you are using the Cloud platform your API key:
```bash
opik configure
```
### Configuring Together AI
You'll need to set your Together AI API key as an environment variable:
```bash
export TOGETHER_API_KEY="YOUR_API_KEY"
```
## Logging LLM calls
In order to log the LLM calls to Opik, you will need to create the OpikLogger callback. Once the OpikLogger callback is created and added to LiteLLM, you can make calls to LiteLLM as you normally would:
```python
from litellm.integrations.opik.opik import OpikLogger
import litellm
opik_logger = OpikLogger()
litellm.callbacks = [opik_logger]
response = litellm.completion(
model="together_ai/meta-llama/Llama-3.2-3B-Instruct-Turbo",
messages=[
{"role": "user", "content": "Why is tracking and evaluation of LLMs important?"}
]
)
```
## Logging LLM calls within a tracked function
If you are using LiteLLM within a function tracked with the [`@track`](/tracing/advanced/log_traces#using-function-decorators) decorator, you will need to pass the `current_span_data` as metadata to the `litellm.completion` call:
```python
from opik import track, opik_context
import litellm
@track
def generate_story(prompt):
response = litellm.completion(
model="together_ai/meta-llama/Llama-3.2-3B-Instruct-Turbo",
messages=[{"role": "user", "content": prompt}],
metadata={
"opik": {
"current_span_data": opik_context.get_current_span_data(),
},
},
)
return response.choices[0].message.content
@track
def generate_topic():
prompt = "Generate a topic for a story about Opik."
response = litellm.completion(
model="together_ai/meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo",
messages=[{"role": "user", "content": prompt}],
metadata={
"opik": {
"current_span_data": opik_context.get_current_span_data(),
},
},
)
return response.choices[0].message.content
@track
def generate_opik_story():
topic = generate_topic()
story = generate_story(topic)
return story
generate_opik_story()
```
***
description: Start here to integrate Opik into your IBM watsonx-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: WatsonX | Opik Documentation
og:description: Build and deploy AI models effectively with WatsonX using Opik for seamless account setup and API integration.
og:site_name: Opik Documentation
og:title: Train AI Models with Opik - WatsonX Model Providers
title: Observability for IBM watsonx with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
[watsonx](https://www.ibm.com/products/watsonx-ai) is a next generation enterprise studio for AI builders to train, validate, tune and deploy AI models.
## Account Setup
[Comet](https://www.comet.com/site?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=watsonx\&utm_campaign=opik) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=watsonx\&utm_campaign=opik) and grab your API Key.
> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=watsonx\&utm_campaign=opik) for more information.
## Getting Started
### Installation
To start tracking your watsonx LLM calls, you can use our [LiteLLM integration](/integrations/litellm). You'll need to have both the `opik` and `litellm` packages installed. You can install them using pip:
```bash
pip install opik litellm
```
### Configuring Opik
Configure the Opik Python SDK for your deployment type. See the [Python SDK Configuration guide](/tracing/advanced/sdk_configuration) for detailed instructions on:
* **CLI configuration**: `opik configure`
* **Code configuration**: `opik.configure()`
* **Self-hosted vs Cloud vs Enterprise** setup
* **Configuration files** and environment variables
## Advanced Usage
### Using with the `@track` decorator
If you have multiple steps in your LLM pipeline, you can use the `@track` decorator to log the traces for each step. If WatsonX is called within one of these steps, the LLM call will be associated with that corresponding step:
```python
from opik import track
from opik.opik_context import get_current_span_data
import litellm
@track
def generate_story(prompt):
response = litellm.completion(
model="watsonx/ibm/granite-13b-chat-v2",
messages=[{"role": "user", "content": prompt}],
metadata={
"opik": {
"current_span_data": get_current_span_data(),
},
},
)
return response.choices[0].message.content
@track
def generate_topic():
prompt = "Generate a topic for a story about Opik."
response = litellm.completion(
model="watsonx/ibm/granite-13b-chat-v2",
messages=[{"role": "user", "content": prompt}],
metadata={
"opik": {
"current_span_data": get_current_span_data(),
},
},
)
return response.choices[0].message.content
@track
def generate_opik_story():
topic = generate_topic()
story = generate_story(topic)
return story
# Execute the multi-step pipeline
generate_opik_story()
```
***
description: Start here to integrate Opik into your xAI Grok-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: xAI Grok | Opik Documentation
og:description: Learn to integrate Opik with xAI Grok via LiteLLM to efficiently track and evaluate your xAI API calls within your Opik projects.
og:site_name: Opik Documentation
og:title: Integrate xAI Grok with Opik for Enhanced AI Tracking
title: Observability for xAI Grok with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
## Comprehensive Example: Dataset Evaluation
For more advanced use cases, you can evaluate entire datasets using Ragas metrics with the Opik evaluation platform:
### 1. Create a Dataset
```python
from datasets import load_dataset
import opik
opik_client = opik.Opik()
# Create a small dataset
fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
# Reformat the dataset to match the schema expected by the Ragas evaluate function
hf_dataset = fiqa_eval["baseline"].select(range(3))
dataset_items = hf_dataset.map(
lambda x: {
"user_input": x["question"],
"reference": x["ground_truths"][0],
"retrieved_contexts": x["contexts"],
}
)
dataset = opik_client.get_or_create_dataset("ragas-demo-dataset", project_name="my-project")
dataset.insert(dataset_items)
```
### 2. Define Evaluation Task
```python
# Create an evaluation task
def evaluation_task(x):
return {
"user_input": x["question"],
"response": x["answer"],
"retrieved_contexts": x["contexts"],
}
```
### 3. Run Evaluation
```python
# Use the RagasMetricWrapper directly with Opik's evaluate function
opik.evaluation.evaluate(
dataset,
evaluation_task,
scoring_metrics=[answer_relevancy_metric],
task_threads=1,
)
```
### 4. Alternative: Using Ragas Native Evaluation
You can also use Ragas' native evaluation function with Opik tracing:
```python
from datasets import load_dataset
from opik.integrations.langchain import OpikTracer
from ragas.metrics import context_precision, answer_relevancy, faithfulness
from ragas import evaluate
fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
# Reformat the dataset to match the schema expected by the Ragas evaluate function
dataset = fiqa_eval["baseline"].select(range(3))
dataset = dataset.map(
lambda x: {
"user_input": x["question"],
"reference": x["ground_truths"][0],
"retrieved_contexts": x["contexts"],
}
)
opik_tracer_eval = OpikTracer(tags=["ragas_eval"], metadata={"evaluation_run": True})
result = evaluate(
dataset,
metrics=[context_precision, faithfulness, answer_relevancy],
callbacks=[opik_tracer_eval],
)
print(result)
```
## Using Ragas metrics to evaluate a RAG pipeline
The `RagasMetricWrapper` can also be used directly within the Opik evaluation platform. This approach is much simpler than creating custom wrappers:
### 1. Define the Ragas metric
We will start by defining the Ragas metric, in this example we will use `AnswerRelevancy`:
```python
from ragas.metrics import AnswerRelevancy
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from opik.evaluation.metrics import RagasMetricWrapper
# Initialize the Ragas metric
llm = LangchainLLMWrapper(ChatOpenAI())
emb = LangchainEmbeddingsWrapper(OpenAIEmbeddings())
ragas_answer_relevancy = AnswerRelevancy(llm=llm, embeddings=emb)
```
### 2. Create the metric wrapper
Simply wrap the Ragas metric with `RagasMetricWrapper`:
```python
# Create the answer relevancy scoring metric
answer_relevancy = RagasMetricWrapper(
ragas_answer_relevancy,
track=True # Enable tracing for the metric computation
)
```
## Enterprise Support
For more information about the Opik Kong plugin, please contact our support team.
## Further Improvements
If you have suggestions for improving the Kong AI Gateway integration, please let us know by opening an issue on [GitHub](https://github.com/comet-ml/opik/issues).
***
description: Start here to integrate Opik into your AISuite-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: AISuite | Opik Documentation
og:description: Learn to integrate Opik with aisuite SDK to track API calls, log prompts, model usage, and responses effectively.
og:site_name: Opik Documentation
og:title: Integrate AISuite with Opik for Seamless Tracking
title: Observability for AISuite with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
This guide explains how to integrate Opik with the aisuite Python SDK. By using the `track_aisuite` method provided by opik, you can easily track and evaluate your aisuite API calls within your Opik projects as Opik will automatically log the input prompt, model used, token usage, and response generated.
## Account Setup
[Comet](https://www.comet.com/site?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=aisuite\&utm_campaign=opik) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=aisuite\&utm_campaign=opik) and grab your API Key.
> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=aisuite\&utm_campaign=opik) for more information.
## Getting Started
### Installation
First, ensure you have both `opik` and `aisuite` packages installed:
```bash
pip install opik "aisuite[openai]"
```
### Configuring Opik
Configure the Opik Python SDK for your deployment type. See the [Python SDK Configuration guide](/tracing/advanced/sdk_configuration) for detailed instructions on:
* **CLI configuration**: `opik configure`
* **Code configuration**: `opik.configure()`
* **Self-hosted vs Cloud vs Enterprise** setup
* **Configuration files** and environment variables
### Configuring AISuite
In order to configure AISuite, you will need to have your OpenAI API Key. You can [find or create your OpenAI API Key in this page](https://platform.openai.com/settings/organization/api-keys).
You can set it as an environment variable:
```bash
export OPENAI_API_KEY="YOUR_API_KEY"
```
Or set it programmatically:
```python
import os
import getpass
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
```
## Logging LLM calls
In order to log the LLM calls to Opik, you will need to wrap the AISuite client with `track_aisuite`. When making calls with that wrapped client, all calls will be logged to Opik:
```python
from opik.integrations.aisuite import track_aisuite
import aisuite as ai
client = track_aisuite(ai.Client(), project_name="aisuite-integration-demo")
messages = [
{"role": "user", "content": "Write a short two sentence story about Opik."},
]
response = client.chat.completions.create(
model="openai:gpt-4o",
messages=messages,
temperature=0.75
)
print(response.choices[0].message.content)
```
## Advanced Usage
### Using with the `@track` decorator
If you have multiple steps in your LLM pipeline, you can use the `@track` decorator to log the traces for each step. If AISuite is called within one of these steps, the LLM call will be associated with that corresponding step:
```python
from opik import track
from opik.integrations.aisuite import track_aisuite
import aisuite as ai
client = track_aisuite(ai.Client(), project_name="aisuite-integration-demo")
@track
def generate_story(prompt):
res = client.chat.completions.create(
model="openai:gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}]
)
return res.choices[0].message.content
@track
def generate_topic():
prompt = "Generate a topic for a story about Opik."
res = client.chat.completions.create(
model="openai:gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}]
)
return res.choices[0].message.content
@track(project_name="aisuite-integration-demo")
def generate_opik_story():
topic = generate_topic()
story = generate_story(topic)
return story
# Execute the multi-step pipeline
generate_opik_story()
```
The trace can now be viewed in the UI with hierarchical spans showing the relationship between different steps:
## Supported aisuite methods
The `track_aisuite` wrapper supports the following aisuite methods:
* `aisuite.Client.chat.completions.create()`
If you would like to track another aisuite method, please let us know by opening an issue on [GitHub](https://github.com/comet-ml/opik/issues).
***
description: Start here to integrate Opik into your Anannas AI-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: Anannas AI | Opik Documentation
og:description: Learn to integrate Anannas AI with Opik using the OpenAI SDK wrapper to log all LLM calls for comprehensive observability.
og:site_name: Opik Documentation
og:title: Integrate Anannas AI with Opik for Unified LLM Access
title: Observability for Anannas AI with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
[Anannas AI](https://anannas.ai) is a unified inference gateway providing access to 500+ models (OpenAI, Anthropic, Mistral, Gemini, DeepSeek, and more) through an OpenAI-compatible API.
## Gateway Overview
Anannas AI provides a unified interface for accessing hundreds of LLM models through a single OpenAI-compatible API, making it easy to switch between providers and models without changing your code.
**Key Features:**
* **500+ Models**: Single API for accessing models from OpenAI, Anthropic, Mistral, Gemini, DeepSeek, and more
* **OpenAI-Compatible API**: Drop-in replacement for OpenAI SDK with standard request/response format
* **Provider Health Monitoring**: Automatic fallback routing in case of failures or degraded performance
* **Low Overhead**: \~0.48ms latency overhead with 5% markup
* **BYOK Support**: Bring Your Own Key for enterprise deployments
## Account Setup
[Comet](https://www.comet.com/site?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=anannas\&utm_campaign=opik) provides a hosted version of the Opik platform. [Simply create an account](https://www.comet.com/signup?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=anannas\&utm_campaign=opik) and grab your API Key.
> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=anannas\&utm_campaign=opik) for more information.
## Getting Started
### Installation
First, ensure you have both `opik` and `openai` packages installed:
```bash
pip install opik openai
```
### Configuring Opik
Configure the Opik Python SDK for your deployment type. See the [Python SDK Configuration guide](/tracing/advanced/sdk_configuration) for detailed instructions on:
* **CLI configuration**: `opik configure`
* **Code configuration**: `opik.configure()`
* **Self-hosted vs Cloud vs Enterprise** setup
* **Configuration files** and environment variables
### Configuring Anannas AI
You'll need an Anannas AI API key. You can get one from [Anannas AI](https://anannas.ai).
Set it as an environment variable:
```bash
export ANANNAS_API_KEY="YOUR_ANANNAS_API_KEY"
```
Or set it programmatically:
```python
import os
import getpass
if "ANANNAS_API_KEY" not in os.environ:
os.environ["ANANNAS_API_KEY"] = getpass.getpass("Enter your Anannas AI API key: ")
```
## Logging LLM Calls
Since Anannas AI provides an OpenAI-compatible API, we can use the [Opik OpenAI SDK wrapper](/integrations/openai) to automatically log Anannas AI calls as generations in Opik.
### Simple LLM Call
```python
import os
from opik.integrations.openai import track_openai
from openai import OpenAI
# Create an OpenAI client with Anannas AI's base URL
client = OpenAI(
api_key=os.environ["ANANNAS_API_KEY"],
base_url="https://api.anannas.ai/v1"
)
# Wrap the client with Opik tracking
client = track_openai(client, project_name="anannas-integration-demo")
# Make a chat completion request
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet", # You can use any of the 500+ models
messages=[
{"role": "system", "content": "You are a knowledgeable AI assistant."},
{"role": "user", "content": "What are some interesting facts about tropical fruits?"}
]
)
# Print the assistant's reply
print(response.choices[0].message.content)
```
## Advanced Usage
### Using with the `@track` decorator
If you have multiple steps in your LLM pipeline, you can use the `@track` decorator to log the traces for each step. If Anannas AI is called within one of these steps, the LLM call will be associated with that corresponding step:
```python
import os
from opik import track
from opik.integrations.openai import track_openai
from openai import OpenAI
# Create and wrap the OpenAI client with Anannas AI's base URL
client = OpenAI(
api_key=os.environ["ANANNAS_API_KEY"],
base_url="https://api.anannas.ai/v1"
)
client = track_openai(client)
@track
def summarize_text(text: str):
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[
{"role": "system", "content": "You create concise summaries of text content."},
{"role": "user", "content": f"Please summarize this text:\n{text}"}
]
)
return response.choices[0].message.content
@track
def analyze_sentiment(summary: str):
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[
{"role": "system", "content": "You perform sentiment analysis on text."},
{"role": "user", "content": f"What is the sentiment of this summary:\n{summary}"}
]
)
return response.choices[0].message.content
@track(project_name="anannas-integration-demo")
def analyze_text(text: str):
# First LLM call: Summarize the text
summary = summarize_text(text)
# Second LLM call: Analyze the sentiment of the summary
sentiment = analyze_sentiment(summary)
return {
"summary": summary,
"sentiment": sentiment
}
# Example usage
text_to_analyze = "Anannas AI provides a unified gateway to access hundreds of LLM models with built-in observability and automatic fallback routing."
result = analyze_text(text_to_analyze)
```
The trace will show nested LLM calls with hierarchical spans.
## Further Improvements
If you have suggestions for improving the Anannas AI integration, please let us know by opening an issue on [GitHub](https://github.com/comet-ml/opik/issues).
***
description: Learn how to integrate Helicone with Opik to log and monitor your LLM traffic using the standard Helicone integration setup.
headline: Helicone | Opik Documentation
og:description: Learn to integrate Helicone with Opik using the OpenAI SDK wrapper to log all LLM calls for comprehensive observability.
og:site_name: Opik Documentation
og:title: Integrate Helicone with Opik for LLM Observability
title: Observability for Helicone with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
[Helicone](https://www.helicone.ai/) is an open-source LLM observability platform that provides monitoring, logging, and analytics for LLM applications. It acts as a proxy layer between your application and LLM providers, offering features like request logging, caching, rate limiting, and cost tracking.
## Gateway Overview
Helicone provides a comprehensive observability layer for LLM applications with features including:
* **Unified API**: OpenAI-compatible API with access to 100+ models through Helicone's model registry
* **Intelligent Routing**: Automatic failures and fallbacks across providers to ensure reliability
* **No Rate Limits**: Skip provider tier restrictions with zero markup on credits
* **Request Logging**: Automatic logging of all LLM requests and responses
* **Caching**: Reduce costs and improve latency with semantic caching
* **Cost Tracking**: Monitor spending across different models and providers with unified observability
* **Multi-Provider Support**: Works with OpenAI, Anthropic, Azure OpenAI, and more
## Account Setup
[Comet](https://www.comet.com/site?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=helicone\&utm_campaign=opik) provides a hosted version of the Opik platform. [Simply create an account](https://www.comet.com/signup?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=helicone\&utm_campaign=opik) and grab your API Key.
> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=helicone\&utm_campaign=opik) for more information.
## Getting Started
### Installation
First, ensure you have both `opik` and `openai` packages installed:
```bash
pip install opik openai
```
### Configuring Opik
Configure the Opik Python SDK for your deployment type. See the [Python SDK Configuration guide](/tracing/advanced/sdk_configuration) for detailed instructions on:
* **CLI configuration**: `opik configure`
* **Code configuration**: `opik.configure()`
* **Self-hosted vs Cloud vs Enterprise** setup
* **Configuration files** and environment variables
### Configuring Helicone
You'll need a Helicone API key. You can get one by signing up at [Helicone](https://www.helicone.ai/).
Set your API key as an environment variable:
```bash
export HELICONE_API_KEY="YOUR_HELICONE_API_KEY"
```
Or set it programmatically:
```python
import os
import getpass
if "HELICONE_API_KEY" not in os.environ:
os.environ["HELICONE_API_KEY"] = getpass.getpass("Enter your Helicone API key: ")
```
## Logging LLM Calls
Since Helicone provides an OpenAI-compatible proxy, we can use the [Opik OpenAI SDK wrapper](/integrations/openai) to automatically log Helicone calls as generations in Opik.
### Simple LLM Call
```python
import os
from opik.integrations.openai import track_openai
from openai import OpenAI
# Create an OpenAI client with Helicone's base URL
client = OpenAI(
api_key=os.environ["HELICONE_API_KEY"],
base_url="https://ai-gateway.helicone.ai"
)
# Wrap the client with Opik tracking
client = track_openai(client, project_name="helicone-integration-demo")
# Make a chat completion request
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a knowledgeable AI assistant."},
{"role": "user", "content": "What is the largest city in France?"}
]
)
# Print the assistant's reply
print(response.choices[0].message.content)
```
## Advanced Usage
### Using with the `@track` decorator
If you have multiple steps in your LLM pipeline, you can use the `@track` decorator to log the traces for each step. If Helicone is called within one of these steps, the LLM call will be associated with that corresponding step:
```python
import os
from opik import track
from opik.integrations.openai import track_openai
from openai import OpenAI
# Create and wrap the OpenAI client with Helicone's base URL
client = OpenAI(
api_key=os.environ["HELICONE_API_KEY"],
base_url="https://ai-gateway.helicone.ai"
)
client = track_openai(client)
@track
def generate_response(prompt: str):
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a knowledgeable AI assistant."},
{"role": "user", "content": prompt}
]
)
return response.choices[0].message.content
@track
def refine_response(initial_response: str):
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You enhance and polish text responses."},
{"role": "user", "content": f"Please improve this response: {initial_response}"}
]
)
return response.choices[0].message.content
@track(project_name="helicone-integration-demo")
def generate_and_refine(prompt: str):
# First LLM call: Generate initial response
initial = generate_response(prompt)
# Second LLM call: Refine the response
refined = refine_response(initial)
return refined
# Example usage
result = generate_and_refine("Explain quantum computing in simple terms.")
```
The trace will show nested LLM calls with hierarchical spans.
## Further Improvements
If you have suggestions for improving the Helicone integration, please let us know by opening an issue on [GitHub](https://github.com/comet-ml/opik/issues).
***
description: Start here to integrate Opik into your LiteLLM-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: LiteLLM | Opik Documentation
og:description: Learn to call LLM APIs with LiteLLM using OpenAI format. Configure using the Python SDK or Proxy Server for seamless integration.
og:site_name: Opik Documentation
og:title: LiteLLM Gateways - Opik Integration
title: Observability for LiteLLM with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
[LiteLLM](https://github.com/BerriAI/litellm) allows you to call all LLM APIs using the OpenAI format \[Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.]. There are two main ways to use LiteLLM:
1. Using the [LiteLLM Python SDK](https://docs.litellm.ai/docs/#litellm-python-sdk)
2. Using the [LiteLLM Proxy Server (LLM Gateway)](https://docs.litellm.ai/docs/#litellm-proxy-server-llm-gateway)
## Account Setup
[Comet](https://www.comet.com/site?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=litellm\&utm_campaign=opik) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=litellm\&utm_campaign=opik) and grab your API Key.
> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=litellm\&utm_campaign=opik) for more information.
## Getting Started
### Installation
First, ensure you have both `opik` and `litellm` packages installed:
```bash
pip install opik litellm
```
### Configuring Opik
Configure the Opik Python SDK for your deployment type. See the [Python SDK Configuration guide](/tracing/advanced/sdk_configuration) for detailed instructions on:
* **CLI configuration**: `opik configure`
* **Code configuration**: `opik.configure()`
* **Self-hosted vs Cloud vs Enterprise** setup
* **Configuration files** and environment variables
### Configuring LiteLLM
In order to use LiteLLM, you will need to configure your LLM provider API keys. For this example, we'll use OpenAI. You can [find or create your API keys in these pages](https://platform.openai.com/settings/organization/api-keys):
You can set them as environment variables:
```bash
export OPENAI_API_KEY="YOUR_API_KEY"
```
Or set them programmatically:
```python
import os
import getpass
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
```
## Using Opik with the LiteLLM Python SDK
### Logging LLM calls
In order to log the LLM calls to Opik, you will need to create the OpikLogger callback. Once the OpikLogger callback is created and added to LiteLLM, you can make calls to LiteLLM as you normally would:
```python
from litellm.integrations.opik.opik import OpikLogger
import litellm
import os
# Set project name for better organization
os.environ["OPIK_PROJECT_NAME"] = "litellm-integration-demo"
opik_logger = OpikLogger()
litellm.callbacks = [opik_logger]
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "Why is tracking and evaluation of LLMs important?"}
]
)
print(response.choices[0].message.content)
```
### Logging LLM calls within a tracked function
If you are using LiteLLM within a function tracked with the [`@track`](/tracing/advanced/log_traces#using-function-decorators) decorator, you will need to pass the `current_span_data` as metadata to the `litellm.completion` call:
```python
from opik import track
from opik.opik_context import get_current_span_data
from litellm.integrations.opik.opik import OpikLogger
import litellm
opik_logger = OpikLogger()
litellm.callbacks = [opik_logger]
@track
def streaming_function(input):
messages = [{"role": "user", "content": input}]
response = litellm.completion(
model="gpt-3.5-turbo",
messages=messages,
metadata = {
"opik": {
"current_span_data": get_current_span_data(),
"tags": ["streaming-test"],
},
}
)
return response
response = streaming_function("Why is tracking and evaluation of LLMs important?")
chunks = list(response)
```
## Using Opik with the LiteLLM Proxy Server
***
headline: Opik Python SDK | Opik Documentation
og:description: Build powerful applications with Opik's Python SDK, enabling seamless integration and functionality for your projects.
og:site_name: Opik Documentation
og:title: Opik Python SDK - Simplify Your Python Development
title: Overview
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
***
headline: OpenTelemetry Python SDK | Opik Documentation
og:description: Learn to instrument your Python applications with OpenTelemetry SDK to effectively send trace data to Opik for better observability.
og:site_name: Opik Documentation
og:title: Instrument Your Python Apps with OpenTelemetry - Opik
subtitle: How to send data to Opik using the OpenTelemetry Python SDK
title: OpenTelemetry Python SDK
toc_max_heading_level: 4
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
# Using the OpenTelemetry Python SDK
This guide shows you how to directly instrument your Python applications with the OpenTelemetry SDK to send trace data to Opik.
## Installation
First, install the required OpenTelemetry packages:
```bash
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
```
## Full Example
Here's a complete example that demonstrates how to instrument a chatbot application with OpenTelemetry and send the traces to Opik:
```python
# Dependencies: opentelemetry-exporter-otlp
import os
import time
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.semconv.resource import ResourceAttributes
# Configure OpenTelemetry
# For comet.com
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "https://www.comet.com/opik/api/v1/private/otel"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = "Authorization=
## Getting started
To use the Microsoft Agent Framework .NET integration with Opik, you will need to have the Agent Framework and the required OpenTelemetry packages installed:
```bash
# Agent Framework (2 packages)
dotnet add package Microsoft.Agents.AI --prerelease
dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease
# Hosting (1 package)
dotnet add package Microsoft.Extensions.Hosting
# OpenTelemetry (3 packages)
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Instrumentation.Http
```
You will also need to configure your OpenAI API key:
```bash
export OPENAI_API_KEY=If you are looking at integrating Opik with your agent using cursor, check-out our pre-built prompt:
## Installation
You can also install the Cursor extension manually by navigating to the `Extensions` tab at the top of the file sidebar and search for `Opik`.
From there, simply click on the `Install` button.
### Configuring the extension
In order to use the extension, you will need to configure your Opik API key. There are a few ways to do this:
In this modal, click on `Open Settings` and paste your Opik API key in the `Opik: Opik API Key` field.
You can find your API key in the [Opik dashboard](https://www.comet.com/api/my/settings).
***
description: Start here to integrate Opik into your Flowise-based genai application for end-to-end LLM observability, unit testing, and optimization.
headline: Flowise | Opik Documentation
og:description: Create AI agents using Flowise's drag-and-drop interface. Leverage Opik to analyze chatflows and enhance user experiences effectively.
og:site_name: Opik Documentation
og:title: Build AI Workflows with Opik - Flowise
title: Observability for Flowise with Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
Flowise AI is a visual LLM builder that allows you to create AI agents and workflows through a drag-and-drop interface. With Opik integration, you can analyze and troubleshoot your chatflows and agentflows to improve performance and user experience.
## Getting started
### Prerequisites
Before integrating Opik with Langflow, ensure you have:
* A running Langflow server
* An Opik Cloud account or self-hosted Opik instance
* Access to the terminal/environment where Langflow runs
### Installation
Install both Langflow and Opik in the same environment:
```bash
pip install langflow opik
```
For more Langflow installation options and details, see the [Langflow documentation](https://docs.langflow.org/).
## Configuring Opik
Configure Opik to connect to your Opik instance. Run the following command in the same terminal/environment where you'll run Langflow:
## Features
* 🔍 **Automatic tracing** of workflow executions and individual node operations
* 📊 **Standard OpenTelemetry** instrumentation using the official Node.js SDK
* 🎯 **Zero-code setup** via n8n's hook system
* 🔌 **OTLP compatible** - works with Opik's OpenTelemetry endpoint
* ⚙️ **Configurable** I/O capture, node filtering, and more
## Account Setup
[Comet](https://www.comet.com/site?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=n8n\&utm_campaign=opik) provides a hosted version of the Opik platform. [Simply create an account](https://www.comet.com/signup?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=n8n\&utm_campaign=opik) and grab your API Key.
> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm\&utm_source=opik\&utm_medium=colab\&utm_content=n8n\&utm_campaign=opik) for more information.
## Quick Start with Docker
The fastest way to get started is with Docker Compose:
```bash
# Clone and navigate to the example
git clone https://github.com/comet-ml/n8n-observability.git
cd n8n-observability/examples/docker-compose
# Set your Opik API key (get one free at https://www.comet.com/signup)
export OPIK_API_KEY=your_api_key_here
# Build and run
docker-compose up --build
```
Open [http://localhost:5678](http://localhost:5678), create a workflow, and see traces in your [Opik dashboard](https://www.comet.com)!
## Setup Options
### Docker (Recommended)
Create a custom Dockerfile that installs the `n8n-observability` package globally:
```dockerfile
FROM n8nio/n8n:latest
USER root
RUN npm install -g n8n-observability
ENV EXTERNAL_HOOK_FILES=/usr/local/lib/node_modules/n8n-observability/dist/hooks.cjs
USER node
```
Then configure your docker-compose.yml with OTLP settings:
The flywheel is the core loop that makes your agent better over time. Each turn through the cycle adds a new test case, fixes a real failure, and verifies the fix before it reaches production. The more you use it, the faster and more reliable it becomes.
## Java Backend Service
Opik's main backend uses Java 21 LTS and Dropwizard, structured as a RESTful web service offering public API
endpoints for core functionality. Full API documentation is available [here](/reference/rest-api/overview).
Key responsibilities:
* **REST API**: Exposes endpoints for traces, spans, experiments, datasets, prompts, feedback scores, and more.
* **Authentication and authorization**: Workspace-scoped permissions with API key and session-based authentication.
* **Database management**: Connects to ClickHouse (via R2DBC reactive driver) for analytics, MySQL (via JDBI) for transactional state, Redis (via Redisson) for caching and streams, and MinIO for file storage.
* **Event processing**: In-memory event bus (Guava AsyncEventBus) with virtual threads, plus Redis Streams for distributed async workflows like online scoring and experiment aggregation.
* **Schema migrations**: Liquibase-based migrations for both MySQL and ClickHouse, run automatically on startup.
* **LLM proxy**: Integrates with multiple LLM providers (OpenAI, Anthropic, Google Gemini, and others) for the playground and LLM-as-judge evaluators.
For observability, Opik uses OpenTelemetry due to its vendor-neutral approach and wide support across languages
and frameworks. It provides a single, consistent way to collect telemetry data from all services and applications.
*You can find the full backend codebase in GitHub under the [`apps/opik-backend`](https://github.com/comet-ml/opik/tree/main/apps/opik-backend) folder.*
## Python Backend Service
Opik includes a Python backend service (Flask + Gunicorn) that handles workloads requiring Python execution:
* **Evaluators**: Executes custom Python evaluation code in sandboxed subprocesses with configurable timeouts, memory limits, and network isolation.
* **Optimization Studio**: Runs LLM-based optimization and prompt engineering workflows with concurrent job management.
* **Background jobs**: Uses Redis Queue (RQ) for asynchronous job processing, including evaluation runs and optimization tasks.
The Java backend calls the Python backend for evaluator execution, and both services share Redis for job coordination.
*You can find the full Python backend codebase in GitHub under the [`apps/opik-python-backend`](https://github.com/comet-ml/opik/tree/main/apps/opik-python-backend) folder.*
## Frontend Application
Opik's frontend is a TypeScript + React single-page application built with Vite and served by Nginx. The Nginx
server handles two roles:
* **Static file serving**: Serves the built React SPA with SPA-style routing (fallback to `index.html`).
* **API reverse proxy**: Routes `/api/*` requests to the Java backend by stripping the `/api` prefix and proxying to port 8080. This includes WebSocket upgrade support for streaming endpoints.
The frontend uses TanStack Router for file-based routing, TanStack React Query for server state management, and Zustand for client-side state.
*You can find the full frontend codebase in GitHub under the [`apps/opik-frontend`](https://github.com/comet-ml/opik/tree/main/apps/opik-frontend) folder.*
## SDKs
Opik provides SDKs for Python and TypeScript. Both SDKs implement asynchronous batching to optimize network
efficiency — they accumulate individual trace and span operations and send them as bulk requests to the backend's
batch endpoints (`POST /v1/private/traces/batch`, `POST /v1/private/spans/batch`, etc.).
| | Python SDK | TypeScript SDK |
| --------------- | --------------------------------------------------------------- | ------------------------------------------------------ |
| **HTTP client** | httpx | fetch API |
| **Batching** | Message queue + batch manager with memory-capped batches (50MB) | Debounce-based batch queue (default 300ms / 100 items) |
| **Retries** | Exponential backoff (0.5s–10s) | Default 2 retries |
*You can find the SDK codebases in GitHub under [`sdks/python`](https://github.com/comet-ml/opik/tree/main/sdks/python) for the Python SDK
and [`sdks/typescript`](https://github.com/comet-ml/opik/tree/main/sdks/typescript) for the TypeScript SDK.*
## ClickHouse
ClickHouse is a column-oriented OLAP database optimized for fast analytics on large datasets. Opik uses ClickHouse
for data that requires near real-time ingestion and analytical queries:
* Traces and spans (LLM call records)
* Feedback scores and evaluations
* Experiment items and results
* Dataset items
The backend connects to ClickHouse via HTTP (port 8123) using a reactive R2DBC driver for non-blocking queries. Async inserts are enabled for high-throughput ingestion with configurable batching and deduplication.
In Kubernetes deployments, ClickHouse is managed by the [Altinity ClickHouse Operator](https://github.com/Altinity/clickhouse-operator), which handles cluster provisioning, scaling, and monitoring. Zookeeper provides distributed coordination for replica synchronization.
Liquibase automates schema management
## MySQL
Opik uses MySQL for ACID-compliant transactional storage of lower-volume but critical data:
* Workspace and user management
* Project metadata
* Dataset and prompt definitions (with versioning)
* Feedback definitions
* Automation rules and alert configurations
The backend connects via JDBC with connection pooling and supports AWS RDS with IAM authentication for cloud deployments.
Liquibase automates schema management
## Redis
Redis serves multiple roles in Opik's architecture:
* **Distributed cache**: High-speed lookups for sessions, API key resolution, and frequently accessed data.
* **Distributed locks**: Coordinating safe access to shared resources across backend replicas (configurable TTL).
* **Rate limiting**: Token-bucket rate limiting to enforce throughput limits per user and workspace.
* **Streams**: Redis Streams power asynchronous workflows:
* *Online evaluation*: Consumer groups process LLM-as-judge scoring and Python evaluator results with automatic message claiming for fault tolerance.
* *Experiment aggregation*: Debounced recomputation of experiment metrics across distributed backend instances.
* **Job queue**: Redis Queue (RQ) coordinates background jobs between the Java backend and the Python backend for evaluator execution and optimization tasks.
## MinIO (S3-Compatible Storage)
Opik uses MinIO as an S3-compatible object store for binary data that doesn't belong in the relational or analytical databases:
* Trace attachments (logs, screenshots, etc.)
* Dataset file uploads (CSV imports)
* Experiment artifacts
* Custom evaluation code
In production deployments, MinIO can be replaced with any S3-compatible storage service (e.g., AWS S3, Google Cloud Storage).
## Observability
Opik is built on top of open-source infrastructure (MySQL, Redis, ClickHouse, Kubernetes), making it straightforward to integrate with popular observability stacks such as Grafana and Prometheus:
* **OpenTelemetry**: All three application services (Java backend, Python backend, and frontend Nginx) support OpenTelemetry instrumentation. An optional OpenTelemetry Collector can be deployed to aggregate traces and metrics, forwarding them to Jaeger, Grafana, or any OTLP-compatible backend.
* **ClickHouse Operator**: Provides real-time performance monitoring and metric exports to Grafana/Prometheus.
* **Standard monitoring**: MySQL, Redis, and Kubernetes all have well-documented strategies for monitoring with Prometheus exporters and Grafana dashboards.
***
headline: Scaling Opik | Opik Documentation
og:description: Learn best practices and configurations for running Opik in production, ensuring resilience and scalability for mission-critical workloads.
og:site_name: Opik Documentation
og:title: Scaling Opik for High-Volume Deployments
subtitle: Comprehensive guide for scaling Opik in production environments
title: Scaling Opik
---------------------
For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://www.comet.com/docs/opik/llms.txt. For full documentation content, see https://www.comet.com/docs/opik/llms-full.txt.
Opik is built to power mission-critical workloads at scale. Whether you're running a small proof of concept or a high-volume enterprise deployment, Opik adapts seamlessly to your needs. Its stateless architecture and powerful ClickHouse backed storage make it highly resilient, horizontally scalable, and future-proof for your data growth.
This guide outlines recommended configurations and best practices for running Opik in production.
## Reference Deployment: Live Production Environment
The following is based on an active, high-volume Opik deployment serving real users in production. Use it as a reference when planning your own infrastructure.
### Workload Profile
| Metric | Value |
| ------------------------- | -------------------- |
| Active users (daily) | \~600 |
| Traces ingested per day | 4–6 million |
| Weekly data ingestion | \~100 GB |
| Total traces stored | 40 million (400 GB) |
| Total spans stored | 250 million (3.1 TB) |
| Total data on disk | 5 TB |
| Select queries per second | \~80 |
| Insert queries per second | \~20 |
| Rows inserted per minute | Up to 75K |
### Infrastructure
**Opik Backend** — 10 pods
* 3 CPU cores, 5 GB memory per pod
**Opik Python Backend** — 12 pods
* 1.5 CPU cores, 2 GB memory per pod
**Opik Frontend** — 3 pods
* Lightweight reverse proxy, minimal CPU and memory requirements
**ClickHouse** — 2 replicas, 1 shard
* 30 CPU cores, 230 GB memory per replica
* 15 TB gp3 SSD storage per replica (3000 IOPS, 250 MiB/s throughput)
### Key Configuration
| Setting | Value |
| ------------------------ | ------------------------------------------- |
| Async ClickHouse inserts | Enabled |
| Rate limiting | 10,000 events per 60 seconds per client |
| Max query execution time | 60 seconds |
| Max memory per query | 10 GB |
| Ingestion method | Batch endpoints (up to 1,000 items/request) |
This deployment runs comfortably at 10–20% average CPU utilization on ClickHouse, with headroom for traffic spikes up to 40–50%.
## Built for Growth
Opik is designed with flexibility at its core. As your data grows and query volumes increase, Opik grows with you.
* **Horizontal scaling** - add more replicas of services to instantly handle more traffic
* **Vertical scaling** - increase CPU, memory, or storage to handle denser workloads
* **Seamless elasticity** - scale out during peak usage and scale back during quieter periods
For larger workloads, ClickHouse can be scaled to support enterprise-level deployments. A common configuration includes:
* 62 CPU cores
* 256 GB RAM
* 25 TB disk space
ClickHouse's read path can also scale horizontally by increasing replicas, ensuring Opik continues to deliver high performance as usage grows.
## Resilient Services Cluster
Opik services are stateless and fault-tolerant, ensuring high availability across environments. Recommended resources:
| Environment | CPU (vCPU) | RAM (GB) |
| ----------- | ---------- | -------- |
| Development | 4 | 8 |
| Production | 13 | 32 |
### Instance Guidance
| Deployment | Instance | vCPUs | Memory (GiB) |
| ------------ | ----------- | ----- | ------------ |
| Dev (small) | c7i.large | 2 | 4 |
| Dev | c7i.xlarge | 4 | 8 |
| Prod (small) | c7i.2xlarge | 8 | 16 |
| Prod | c7i.4xlarge | 16 | 32 |
### Backend Service (Scales to Demand)
| Metric | Dev | Prod Small | Prod Large |
| ------------ | --- | ---------- | ---------- |
| Replicas | 2 | 5 | 7 |
| CPU cores | 1 | 2 | 2 |
| Memory (GiB) | 2 | 9 | 12 |
### Frontend Service (Always Responsive)
| Metric | Dev | Prod Small | Prod Large |
| ---------------- | --- | ---------- | ---------- |
| Replicas | 2 | 3 | 5 |
| CPU (millicores) | 5 | 50 | 50 |
| Memory (MiB) | 16 | 32 | 64 |
## ClickHouse: High-Performance Storage
At the heart of Opik's scalability is ClickHouse, a proven, high-performance analytical database designed for large-scale workloads. Opik leverages ClickHouse for storing traces and spans, ensuring fast queries, robust ingestion, and uncompromising reliability.
### Instance Types
Memory-optimized instances are recommended, with a minimum 4:1 memory-to-CPU ratio:
| Deployment | Instance |
| ---------- | ----------- |
| Small | m7i.2xlarge |
| Medium | m7i.4xlarge |
| Large | m7i.8xlarge |
### Replication Strategy
* **Development**: 1 replica
* **Production**: 2 replicas
Always scale vertically before adding more replicas for efficiency.
### CPU & Memory Guidance
Target 10–20% CPU utilization, with safe spikes up to 40–50%.
Maintain at least a 4:1 memory-to-CPU ratio (extend to 8:1 for very large environments).
| Deployment | CPU cores | Memory (GiB) |
| ------------------ | --------- | ------------ |
| Minimum | 2 | 8 |
| Development | 4 | 16 |
| Production (small) | 6 | 24 |
| Production | 32 | 128 |
### Disk Recommendations
To ensure reliable performance under heavy load:
| Volume | Value |
| ---------- | ----------------------------- |
| Family | SSD |
| Type | gp3 |
| Size | 8–16 TiB (workload dependent) |
| IOPS | 3000 |
| Throughput | 250 MiB/s |
Opik's ClickHouse layer is resilient even under sustained, large-scale ingestion, ensuring queries stay fast.
## Ingestion Best Practices
How you send data to Opik matters more than how much hardware you run. Optimizing your ingestion pattern is the single most impactful thing you can do for performance.
### Use Batch Endpoints
Opik provides batch ingestion endpoints that accept up to **1,000 items per request**:
* `POST /v1/private/traces/batch`
* `POST /v1/private/spans/batch`
Instead of sending 1,000 individual HTTP requests (each triggering a separate database insert), a single batch request handles them all at once. This dramatically reduces connection overhead and ClickHouse insert pressure.