Opik's MCP server

Opik’s MCP server is a Python 3.13+ package that connects your AI host (Claude Code, Cursor, VS Code Copilot, MCP Inspector) directly to your Opik workspace — read traces, log scores, save prompt versions, and ask Ollie investigative questions, all from the chat.

opik-mcp has been rewritten in Python. If you previously installed the npx-based JavaScript server, replace npx -y opik-mcp with uvx opik-mcp in the snippets below.

Before you start

You will need:

  • OPIK_API_KEY — generate one at comet.com/api/my/settings.
  • COMET_WORKSPACE — the lowercase workspace name from your Comet URL. For example, https://www.comet.com/acme-ai/...COMET_WORKSPACE=acme-ai.
  • uv installed locally. The fastest way is brew install uv (macOS) or curl -LsSf https://astral.sh/uv/install.sh | sh. uvx (bundled with uv) fetches and runs the latest published opik-mcp on demand — no global install required.

Pre-release note: opik-mcp is not yet published to PyPI. Until the first PyPI release lands, replace uvx opik-mcp in any snippet on this page with uvx --from git+https://github.com/comet-ml/opik-mcp.git opik-mcp.

Setting up the MCP server

Add the server with one command:

$claude mcp add --transport stdio opik-mcp \
> --env OPIK_API_KEY=<your-key> \
> --env COMET_WORKSPACE=<your-workspace> \
> -- uvx opik-mcp

Or edit ~/.claude.json directly:

1{
2 "mcpServers": {
3 "opik-mcp": {
4 "type": "stdio",
5 "command": "uvx",
6 "args": ["opik-mcp"],
7 "env": {
8 "OPIK_API_KEY": "<your-key>",
9 "COMET_WORKSPACE": "<your-workspace>"
10 }
11 }
12 }
13}

Restart Claude Code, verify with /mcp (opik-mcp should appear as connected), and then ask in the chat: “list my Opik projects”.

Self-hosted Opik. Add COMET_URL_OVERRIDE to the env block (and OPIK_URL if Opik lives at a non-default path). ask_ollie and run_experiment are available on Comet Cloud only — on self-hosted those calls fail at dispatch; use read / list / write directly.

Using the MCP server

The tools at a glance

ToolPurpose
readUniversal read by id / name / opik:// URI.
listUniversal list with optional name filter and pagination.
ask_ollieInvestigate or synthesize via the Opik in-product assistant.
writeUniversal write — log traces/spans, score, comment, save prompts, manage test suites and experiments.
schemaIntrospect write-operation schemas (used by the LLM to construct valid payloads).
run_experimentRun an evaluation experiment end-to-end via Ollie.

Browsing your workspace

list my Opik projects

what was the most recent trace logged to the “demo” project?

show me trace <trace-id>

Scoring, commenting, saving prompts

score trace <trace-id> 0.9 on helpfulness with reason “great recovery”

comment “retry with temperature=0” on span <span-id>

save the following text as a new version of the “rerank-system” prompt: …

For the full set of write operations and their payload shapes, ask the host “show me the schema for trace.create” (calls the schema tool) or see the README.

Asking Ollie

For investigative or cross-entity questions:

why are spans in the “demo” project slower this week than last?

compare experiments “rerank-v2” and “rerank-v3” on factuality

ask_ollie returns a thread_id you can pass back on follow-ups to preserve context. For more about Ollie itself, see Ollie. See Ollie & auto-approve below before running write-style prompts in shared workspaces.

Ollie & auto-approve

By default, writes that Ollie performs mid-stream (scores, comments, prompt versions, test-suite items) execute without a per-action confirmation step. Each auto-approved write is logged as a JSON audit row on the opik_mcp.audit Python logger.

To require manual confirmation instead, set OPIK_MCP_AUTO_APPROVE=disabled in the server’s env block. Ollie’s confirmation requests then surface as typed errors that you can re-issue manually.

ask_ollie and run_experiment are available on Comet Cloud only — on self-hosted those calls fail at dispatch; use read / list / write directly.

Known host limits

  • Cursor enforces a 60-second hard tool-call timeout that does not reset on progress notifications. Long ask_ollie turns will fail on Cursor. For long-running investigations, use Claude Code or VS Code Copilot.

Example conversation

A typical investigative loop using Claude Code:

You: Why did the experiment “gpt-4o-rerank-v3” regress on factuality?

Claude: (calls ask_ollie) Three traces failed because the reranker dropped the system message. The remaining 12 traces scored above 0.8…

You: Score the bottom 3 traces 0.2 with reason “dropped system message”.

Claude: (calls write with score.create ×3) Done — three scores recorded on traces <id-1>, <id-2>, <id-3>.