Opik's MCP server
Opik’s MCP server is a Python 3.13+ package that connects your AI host (Claude Code, Cursor, VS Code Copilot, MCP Inspector) directly to your Opik workspace — read traces, log scores, save prompt versions, and ask Ollie investigative questions, all from the chat.
opik-mcphas been rewritten in Python. If you previously installed the npx-based JavaScript server, replacenpx -y opik-mcpwithuvx opik-mcpin the snippets below.
Before you start
You will need:
OPIK_API_KEY— generate one atcomet.com/api/my/settings.COMET_WORKSPACE— the lowercase workspace name from your Comet URL. For example,https://www.comet.com/acme-ai/...→COMET_WORKSPACE=acme-ai.uvinstalled locally. The fastest way isbrew install uv(macOS) orcurl -LsSf https://astral.sh/uv/install.sh | sh.uvx(bundled withuv) fetches and runs the latest publishedopik-mcpon demand — no global install required.
Pre-release note: opik-mcp is not yet published to PyPI. Until the first
PyPI release lands, replace uvx opik-mcp in any snippet on this page with
uvx --from git+https://github.com/comet-ml/opik-mcp.git opik-mcp.
Setting up the MCP server
Claude Code
Cursor
VS Code Copilot
MCP Inspector
Add the server with one command:
Or edit ~/.claude.json directly:
Restart Claude Code, verify with /mcp (opik-mcp should appear as
connected), and then ask in the chat: “list my Opik projects”.
Self-hosted Opik. Add COMET_URL_OVERRIDE to the env block (and OPIK_URL
if Opik lives at a non-default path). ask_ollie and run_experiment are
available on Comet Cloud only — on self-hosted those calls fail at dispatch;
use read / list / write directly.
Using the MCP server
The tools at a glance
Browsing your workspace
list my Opik projects
what was the most recent trace logged to the “demo” project?
show me trace
<trace-id>
Scoring, commenting, saving prompts
score trace
<trace-id>0.9 on helpfulness with reason “great recovery”
comment “retry with temperature=0” on span
<span-id>
save the following text as a new version of the “rerank-system” prompt: …
For the full set of write operations and their payload shapes, ask the host
“show me the schema for trace.create” (calls the schema tool) or see the
README.
Asking Ollie
For investigative or cross-entity questions:
why are spans in the “demo” project slower this week than last?
compare experiments “rerank-v2” and “rerank-v3” on factuality
ask_ollie returns a thread_id you can pass back on follow-ups to preserve
context. For more about Ollie itself, see Ollie.
See Ollie & auto-approve below before running write-style prompts in shared
workspaces.
Ollie & auto-approve
By default, writes that Ollie performs mid-stream (scores, comments, prompt
versions, test-suite items) execute without a per-action confirmation step.
Each auto-approved write is logged as a JSON audit row on the opik_mcp.audit
Python logger.
To require manual confirmation instead, set OPIK_MCP_AUTO_APPROVE=disabled in
the server’s env block. Ollie’s confirmation requests then surface as typed
errors that you can re-issue manually.
ask_ollie and run_experiment are available on Comet Cloud only — on
self-hosted those calls fail at dispatch; use read / list / write
directly.
Known host limits
- Cursor enforces a 60-second hard tool-call timeout that does not reset on
progress notifications. Long
ask_ollieturns will fail on Cursor. For long-running investigations, use Claude Code or VS Code Copilot.
Example conversation
A typical investigative loop using Claude Code:
You: Why did the experiment “gpt-4o-rerank-v3” regress on factuality?
Claude: (calls
ask_ollie) Three traces failed because the reranker dropped the system message. The remaining 12 traces scored above 0.8…You: Score the bottom 3 traces 0.2 with reason “dropped system message”.
Claude: (calls
writewithscore.create×3) Done — three scores recorded on traces<id-1>,<id-2>,<id-3>.