Opik’s MCP server is a Python 3.13+ package that connects your AI host (Claude Code, Cursor, VS Code Copilot, MCP Inspector) directly to your Opik workspace — read traces, log scores, save prompt versions, and ask Ollie investigative questions, all from the chat.
opik-mcphas been rewritten in Python. If you previously installed the npx-based JavaScript server, replacenpx -y opik-mcpwithuvx opik-mcpin the snippets below.
You will need:
OPIK_API_KEY — generate one at comet.com/api/my/settings.COMET_WORKSPACE — the lowercase workspace name from your Comet URL. For example, https://www.comet.com/acme-ai/... → COMET_WORKSPACE=acme-ai.uv installed locally. The fastest way is brew install uv (macOS) or curl -LsSf https://astral.sh/uv/install.sh | sh. uvx (bundled with uv) fetches and runs the latest published opik-mcp on demand — no global install required.Pre-release note: opik-mcp is not yet published to PyPI. Until the first
PyPI release lands, replace uvx opik-mcp in any snippet on this page with
uvx --from git+https://github.com/comet-ml/opik-mcp.git opik-mcp.
Add the server with one command:
Or edit ~/.claude.json directly:
Restart Claude Code, verify with /mcp (opik-mcp should appear as
connected), and then ask in the chat: “list my Opik projects”.
Self-hosted Opik. Add COMET_URL_OVERRIDE to the env block (and OPIK_URL
if Opik lives at a non-default path). ask_ollie and run_experiment are
available on Comet Cloud only — on self-hosted those calls fail at dispatch;
use read / list / write directly.
list my Opik projects
what was the most recent trace logged to the “demo” project?
show me trace
<trace-id>
score trace
<trace-id>0.9 on helpfulness with reason “great recovery”
comment “retry with temperature=0” on span
<span-id>
save the following text as a new version of the “rerank-system” prompt: …
For the full set of write operations and their payload shapes, ask the host
“show me the schema for trace.create” (calls the schema tool) or see the
README.
For investigative or cross-entity questions:
why are spans in the “demo” project slower this week than last?
compare experiments “rerank-v2” and “rerank-v3” on factuality
ask_ollie returns a thread_id you can pass back on follow-ups to preserve
context. For more about Ollie itself, see Ollie.
See Ollie & auto-approve below before running write-style prompts in shared
workspaces.
By default, writes that Ollie performs mid-stream (scores, comments, prompt
versions, test-suite items) execute without a per-action confirmation step.
Each auto-approved write is logged as a JSON audit row on the opik_mcp.audit
Python logger.
To require manual confirmation instead, set OPIK_MCP_AUTO_APPROVE=disabled in
the server’s env block. Ollie’s confirmation requests then surface as typed
errors that you can re-issue manually.
ask_ollie and run_experiment are available on Comet Cloud only — on
self-hosted those calls fail at dispatch; use read / list / write
directly.
ask_ollie turns will fail on Cursor. For
long-running investigations, use Claude Code or VS Code Copilot.A typical investigative loop using Claude Code:
You: Why did the experiment “gpt-4o-rerank-v3” regress on factuality?
Claude: (calls
ask_ollie) Three traces failed because the reranker dropped the system message. The remaining 12 traces scored above 0.8…You: Score the bottom 3 traces 0.2 with reason “dropped system message”.
Claude: (calls
writewithscore.create×3) Done — three scores recorded on traces<id-1>,<id-2>,<id-3>.