Model Context Protocol (MCP)

The Model Context Protocol (MCP) emerged in late 2024 as the architectural solution for AI agent connectivity.

In 2023, LLMs reached a capability threshold that, in theory, made autonomous agents possible. Models could write production code, analyze complex documents, reason through multi-step problems, and explain their decision-making processes. The intelligence was there, but its hands were tied without connectivity.

Through 2023 and early 2024, the fundamental constraint on agentic AI was isolation. Models existed in a vacuum, severed from the databases, APIs, filesystems, and business tools that enable most useful work. Connecting these resources required custom integration code, and as teams tried to scale from one model accessing three data sources to five models accessing twenty sources, the engineering effort exploded super-linearly. You could build a brilliant reasoning engine, but connecting it to your production systems required building and maintaining a web of brittle, bespoke integrations.

This is the problem that MCP solves. Developed by Anthropic and subsequently open-sourced under the Linux Foundation’s Agentic AI Foundation, MCP standardizes how AI systems interact with external data and tools. Think of it as establishing the equivalent of USB-C for AI: instead of building custom connectors for every possible model-data pairing, you build to a single protocol specification, and everything connects.

This shift from bespoke integrations to standardized connectivity is what makes the transition from “chatbots” to “agents” economically viable. An agent that can autonomously navigate systems to achieve goals requires reliable, repeatable access to context. MCP provides exactly that.

The Multiplying Cost of Custom Integrations

Before MCP, connecting AI applications to data sources meant building point-to-point integrations. A development team using Claude to analyze GitHub issues, query PostgreSQL databases, and search Slack conversations needed three separate integration layers. Each required unique authentication handling, schema definitions, error management logic, and maintenance as APIs evolved.

Custom integrations scale badly. Adding a second model (say, GPT-5.2 for a different use case) meant potentially rebuilding those same three integrations with different protocols. Adding a fourth data source (maybe a CRM system) meant building new connectors for both models. This N×M complexity pattern is familiar to anyone who’s worked on integration platforms. You end up with N models times M data sources, each with distinct connection implementations, each a potential point of failure.

Frameworks like LangChain attempted to solve this by providing pre-built tools for common services. This helped, but it still required framework-specific implementations and left teams locked into orchestration patterns. More fundamentally, it didn’t solve the underlying architectural issue: the lack of a universal interface contract between the intelligence layer (the model) and the context layer (the data).

Organizations trying to build production-grade agentic systems faced a choice between investing enormous engineering resources into maintaining brittle integration code or limiting their agents to narrow, predefined workflows. Neither option enabled the kind of flexible, autonomous behavior that defines genuine agentic AI.

What is MCP?

The Model Context Protocol is an open standard that allows AI applications to connect to any data source or tool through a single, universal interface. Key features include:

Universal connectivity: Any AI application that supports MCP (like Claude Desktop or ChatGPT) can connect to any MCP server (like GitHub, Slack, or your internal databases) without custom integration code.
Client-server architecture: AI applications (hosts) connect to data providers (servers) through a standardized JSON-RPC 2.0 protocol.
Resources, Tools, and Prompts: MCP supports three core primitives for reading data, executing actions, and standardizing workflows.
Capability negotiation: Clients and servers exchange supported features during connection, ensuring backward and forward compatibility as the protocol evolves.

The Three Primitives

Resources provide read-only access to data through URI-addressable endpoints. Unlike Retrieval Augmented Generation systems that rely on vectorization and semantic search, Resources offer deterministic, structured data access. An MCP server might expose postgres://production/users/schema as a Resource, allowing an agent to examine database structure before writing queries. The protocol supports active subscriptions, meaning servers can notify clients when underlying data changes. This capability is crucial for observability agents monitoring log files or development assistants tracking code changes in real time.

Tools enable agents to perform actions and trigger side effects. These are executable functions exposed by servers, each defined by a rigorous JSON Schema specifying expected arguments and return types. When an agent needs to commit code to GitHub, query an API, or update a database record, it invokes Tools. The schema definitions get injected into the model’s context, allowing it to generate syntactically correct function calls without the host application needing to understand the business logic.

Prompts enable servers to share useful prompts and prompt templates for interacting with their services. A Sentry MCP server might expose an analyze-issue Prompt that automatically pulls relevant stack traces from Resources, formats them with debugging instructions, and presents the complete package to the model. This standardizes how teams approach recurring tasks, ensuring agents always receive optimal context for specific workflows.

How the Protocol Works

The protocol itself runs on JSON-RPC 2.0, a lightweight remote procedure call standard that supports bidirectional communication. This choice enables patterns where servers notify clients of state changes or where clients can sample model reasoning during execution. For local integrations like connecting an IDE to a user’s filesystem, MCP uses standard input/output (stdio) transport, with the host spawning the server as a subprocess. For distributed architectures where an agent runs in the cloud and needs to connect to remote services, the protocol employs Server-Sent Events over HTTP for asynchronous updates.

The architectural elegance lies in capability negotiation. When a client and server connect, they exchange supported features through an initialization handshake. The server might advertise that it supports real-time Resource updates; the client might indicate it can render interactive Prompts. This negotiation ensures backward and forward compatibility as the protocol evolves, preventing the fragmentation that plagued earlier attempts at AI integration standards.

MCP in the Context Engineering Landscape

MCP addresses one dimension of a larger challenge: getting the right information to models at the right time. Context engineering, the practice of designing what information reaches a model and how, includes several complementary context retrieval techniques that often work together in production systems.

Retrieval Augmented Generation (RAG) remains the dominant approach for knowledge-intensive applications. RAG systems vectorize documents into embeddings, store them in specialized databases, and retrieve semantically similar chunks based on user queries. This approach excels when you need to search large, unstructured corpora where the relevant information isn’t known in advance. A customer support agent answering questions about product documentation benefits from RAG’s ability to find relevant passages across thousands of help articles.

MCP’s Resources primitive operates differently. Rather than semantic similarity search, Resources provide deterministic, structured access to specific data endpoints. When your agent needs the current schema of a production database or the exact contents of a configuration file, you don’t want probabilistic retrieval. You want the system to fetch precisely what was requested. The two approaches solve different problems: RAG handles “find information related to X” while MCP handles “get the specific data at location Y.”

Production systems increasingly combine both. An agent might use RAG to identify which database tables are relevant to a user’s question, then invoke MCP Resources to fetch the actual schema definitions and sample data. The RAG layer handles discovery; the MCP layer handles precise retrieval.

Context window management ties these techniques together. Modern models support increasingly large context windows, but filling them indiscriminately degrades performance and increases costs. Effective context engineering means being selective: using RAG to identify relevant documents, MCP to fetch structured data, and careful prompt design to present information in formats models process efficiently. The goal isn’t maximum context but optimal context, providing exactly what the model needs to accomplish the task.

Building With MCP: The Developer Experience

What does MCP integration look like in practice? The ecosystem provides SDKs for Python and TypeScript that abstract the JSON-RPC wire protocol complexity, letting developers focus on exposing their data and tools rather than managing connection lifecycle.

Python and FastMCP

The Python SDK introduces FastMCP, a decorator-based framework that minimizes boilerplate. A developer building an MCP server to expose weather data writes a standard Python function with type hints, and FastMCP handles the rest — schema generation, argument validation, and protocol serialization:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("Weather Service")

@mcp.tool()
def get_forecast(location: str, days: int = 3) -> dict:
"""Fetch weather forecast for a location.

Args:
    location: City name or coordinates
    days: Number of days to forecast (1-7)
"""
# Logic to call external weather API
return {
    "location": location,
    "forecast": [{"day": 1, "temp": 72, "condition": "sunny"}]
}

The @mcp.tool() decorator transforms this ordinary Python function into an MCP-compliant tool. The function signature becomes the contract: location is a required string, days is an optional integer with a default value. The docstring provides the description that gets shown to the model, helping it understand when and how to invoke the tool.

More sophisticated tools can request a Context object to provide feedback during execution. When processing long-running operations like downloading large datasets or running database migrations, agents need to communicate progress back to users:

from mcp.server.fastmcp import Context

@mcp.tool()
def process_dataset(ctx: Context, file_path: str) -> str:
"""Process a large dataset file."""
ctx.info(f"Loading {file_path}…")
# Processing logic
ctx.info("Processing 50% complete")
# More processing
return "Dataset processed successfully"

These informational messages appear in the user interface, transforming opaque agent operations into transparent workflows. Users can see that their agent is actively working rather than wondering if it’s hung.

TypeScript SDK

The TypeScript SDK follows similar patterns but leverages Zod for runtime schema validation. For web-based agents or Node.js environments, it provides native support for both stdio and Server-Sent Events transport:

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
CallToolRequestSchema,
ListToolsRequestSchema,
} from "@modelcontextprotocol/sdk/types.js";

// 1. Initialize the Server
const server = new Server(
{ name: "analytics-server", version: "1.0.0" },
{ capabilities: { tools: {} } }
);

// 2. The "Discovery" Handler: Tells the LLM what tools exist
server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [
{
name: "query_metrics",
description: "Query analytics metrics",
inputSchema: {
type: "object",
properties: {
metric: { type: "string" },
timeRange: { type: "string" }
},
required: ["metric"],
},
},
],
}));

// 3. The "Execution" Handler: Runs the logic when the LLM calls the tool
server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === "query_metrics") {
const { metric } = request.params.arguments as { metric: string };

// Logic to fetch from your database or API
return {
  content: [
    { 
      type: "text", 
      text: `Fetched metrics for ${metric}: 42 units.` 
    }
  ],
};

}
throw new Error(Tool not found: ${request.params.name});
});

// 4. Connect the Transport
const transport = new StdioServerTransport();
await server.connect(transport);

The SDK handles capability negotiation, protocol version management, and connection lifecycle. Developers define tool handlers that receive validated input and return structured output, trusting the framework to manage serialization and error propagation.

Writing Tool Descriptions That Models Actually Understand

The quality of your tool descriptions directly determines whether agents invoke them correctly. Models don’t execute code to understand what a tool does — they read the description field and the inputSchema to decide when and how to call it. Vague or ambiguous descriptions lead to hallucinated tool calls, incorrect arguments, or tools that never get invoked when they should.

Apply technical writing principles to your descriptions:

Be specific about when to use the tool, not just what it does. Instead of “Search the knowledge base,” write “Search the knowledge base when users ask questions about product documentation, API references, or troubleshooting guides. Do not use for general web search.”

Define constraints explicitly. If a parameter accepts only specific values, list them. If there’s a format requirement (ISO dates, currency codes), specify it in the parameter description, not just the schema type.

Front-load critical information. Models process descriptions linearly. Put the most important details first: “Query production database metrics. WARNING: This runs directly against the production database. Use read-only queries only.”

Test with your golden prompt set. The next section covers debugging, where you’ll validate that your descriptions actually guide the model to invoke tools correctly. Well-written descriptions significantly reduce the iteration cycles needed to achieve reliable tool usage.

MCP Debugging and Development Workflow

Protocol development introduces opacity. When an agent fails to invoke a tool correctly, is the problem in the model’s reasoning, the tool’s schema definition, or the server’s execution logic? The MCP Inspector provides a universal client for debugging these interactions.

The Inspector connects to a running MCP server and exposes a GUI for browsing available Resources, listing Prompts, and manually invoking Tools. Developers can test that their schema definitions are sufficiently clear by attempting to trigger tools with natural language descriptions. If the Inspector can’t determine which tool to call based on the description, neither will a production agent.

More critically, the Inspector displays the raw JSON-RPC message log. When a tool invocation fails, developers can examine the exact request payload the client sent and the response the server returned. This visibility is essential for catching serialization errors, type mismatches, or protocol violations that would otherwise manifest as cryptic failures in production.

A common debugging pattern involves creating a “golden prompt set” — a collection of user queries that should trigger specific tools. Running these prompts through the Inspector validates that tool descriptions are unambiguous and that argument schemas match what the model naturally generates. For a send_email tool, the golden set might include “Email the Q4 report to the finance team” (should trigger) and “Show me all emails from last week” (should not trigger). If the model confuses these cases, the tool description needs refinement.

Constraints and Security

Before examining how organizations deploy MCP in production, it’s important to understand two fundamental constraints that shape every implementation: context window limits and security attack surfaces.

Handling MCP-Induced Context Bloat

MCP solves connectivity but introduces new challenges. The most immediate is context window bloat. Loading schema definitions for twenty tools can consume 15,000 tokens before any actual user query gets processed. This increases both latency and cost, particularly for models with expensive input token pricing.

Passing intermediate results through the model compounds this problem. If an agent queries a database and receives a 10MB CSV file, processing that data by feeding it back through the LLM is prohibitively expensive. The current architecture requires shuttling data through the model even when the agent just needs to perform deterministic operations like filtering rows or calculating averages.

The ecosystem is evolving toward code execution to address these constraints. Instead of the model generating JSON tool calls that the host executes sequentially, the model writes scripts (typically TypeScript or Python) that import MCP tools as libraries. The script executes in a sandboxed environment, allowing the agent to write loops, process large datasets locally, and return only final summaries. Anthropic reports this approach can reduce token consumption by 98% for data-heavy operations.

Cloudflare’s implementation, for example, uses V8 isolates rather than container-based sandboxes. These lightweight JavaScript runtimes spin up in milliseconds, allowing fresh, isolated environments for every code block execution. This represents a significant performance improvement over container orchestration for short-lived computations.

Hybrid Context Management Strategies

The context bloat problem highlights why production agents rarely rely on a single context retrieval mechanism. Teams building reliable agentic systems typically layer multiple context strategies based on information type and access patterns.

Static context, like system prompts and tool definitions, gets loaded once per session. This is where MCP schema definitions live, and where careful optimization pays compound dividends across every interaction.

Dynamic structured context, like database records or API responses, flows through MCP Resources and Tools. The key characteristic is determinism: when you request a specific resource, you get exactly that resource.

Dynamic unstructured context, like relevant documentation or historical conversation snippets, typically comes from RAG pipelines or vector search. The retrieval is probabilistic, returning content that’s semantically similar to the query rather than an exact match.

The art of context engineering lies in routing each information need to the appropriate retrieval mechanism. A well-designed agent might maintain a small, carefully curated set of MCP tools (avoiding schema bloat), use RAG to identify relevant background documents when needed, and cache frequently accessed Resources to reduce redundant fetches. Observability platforms become essential for understanding which context sources actually contribute to successful task completion versus which add noise and cost.

MCP Security Considerations

Connecting probabilistic reasoning engines to deterministic execution environments expands attack surfaces significantly. Command injection vulnerabilities occur when tools pass model-generated strings directly to shell commands without sanitization. Path traversal attacks happen when file-reading tools lack rigorous validation, allowing requests like ../../../../etc/passwd to expose system data.

The “confused deputy” problem is particularly acute in multi-server environments. An agent with access to both a public web search server and a private email server can be tricked by prompt injections embedded in web content. The malicious content might instruct the agent to exfiltrate data using the legitimate email server. The email server acts correctly according to its authorization but under the influence of external input.

Defense requires multiple layers because no single mitigation addresses all attack vectors. MCP servers should run in isolated environments with restricted filesystem access, preventing path traversal attacks from exposing sensitive system data. Tools marked as destructive — those that write, delete, or modify state — must enforce user confirmation at the client level, creating a human-in-the-loop checkpoint that blocks automated exploitation. Network policies need strict allowlists to prevent data exfiltration, ensuring that even if an agent is compromised through prompt injection, it cannot contact external command-and-control servers. Observability platforms that trace tool execution provide the final layer, detecting anomalous patterns like unusual tool invocation sequences or unexpected parameter values that indicate potential compromise.

MCP Implementation Across Different Contexts

With the protocol mechanics, development workflow, and constraints established, we can examine how MCP gets deployed across different organizational contexts. Three distinct patterns have emerged: developer tooling integration, enterprise platform deployment, and consumer marketplace extensions.

Developer Tooling: Observability as a First-Class Primitive

Production agentic systems require deep observability. When an agent makes dozens of tool calls across multiple MCP servers to accomplish a task, operators need visibility into execution paths, latency distributions, and failure modes. Opik’s MCP server integration demonstrates how observability platforms can expose trace data and optimization capabilities directly through the protocol.

The Opik MCP server connects IDEs like Cursor or Windsurf to the Opik observability platform, transforming code editors into prompt engineering workstations. Developers can query traces using natural language: “Show me the last 10 traces where the checkout tool failed.” The MCP server fetches matching traces and presents them in the editor, allowing engineers to inspect exact inputs and outputs that led to failures without switching contexts between development and LLM monitoring tools.

This integration extends to prompt management. A developer iterating on a production prompt can select text in their editor and invoke “Save as production-prompt-v3” through an MCP tool. The prompt gets versioned and stored in Opik’s registry, maintaining synchronization between the codebase and the prompt library. This workflow reduces the friction of prompt iteration, encouraging teams to experiment and optimize.

Beyond trace inspection, Opik applies MCP to algorithmic optimization of agent behavior through its MetaPrompt Optimizer. The system parses MCP server manifests to understand available tools, their dependencies, and parameter constraints. It identifies usage patterns like tools that agents frequently invoke incorrectly or parameters that consistently get omitted. Based on this analysis, the optimizer refines both the natural language instructions on when to use tools and the schema descriptions explaining how to use them.

This represents a second-order effect of standardization. By defining tools in a machine-readable format through MCP, we enable algorithmic analysis and optimization that was impossible with proprietary integration code. The optimizer might detect that an agent consistently fails to provide a required currency code for financial tools and automatically rewrite the tool description to explicitly demand this parameter. The refined schemas improve agent reliability without requiring manual prompt engineering. Learn more about Opik’s tool optimization algorithms.

Enterprise Deployment Patterns

Enterprise adoption of MCP requires addressing concerns that go beyond the protocol itself: orchestration, observability, and integration with existing infrastructure. IBM’s implementation across their Watsonx and BeeAI platforms illustrates the architectural patterns emerging as large organizations deploy MCP in production environments.

Orchestration layers manage MCP connection lifecycle, handling session timeouts, error recovery, and protocol negotiation. IBM’s BeeAI platform serves as this orchestration layer, providing the runtime that binds together cognitive frameworks (like LangChain or LangGraph) with MCP’s data interfaces. This separation of concerns lets developers focus on agent logic rather than connection plumbing.

Deep telemetry becomes mandatory for enterprise compliance and debugging. Production agentic systems need session replay capabilities showing exactly when MCP tools were invoked, the latency of each call, tokens consumed, and routing decisions. IBM Telemetry in Watsonx Orchestrate provides this visibility, while other organizations implement similar LLM tracing through LLM evaluation frameworks like Opik (covered in the previous section).

Internal MCP servers expose company-specific data and workflows. A common starting pattern is building servers that search internal knowledge bases or documentation. IBM’s tutorials demonstrate creating a server that searches their documentation library using FastMCP, fetching a JSON index from GitHub and performing case-insensitive searches. Once added to an IDE’s configuration, the coding assistant can access internal documentation to answer technical questions.

The key architectural insight applies universally: MCP complements rather than replaces orchestration frameworks. LangChain and LangGraph define how an agent reasons, plans, and makes decisions — the cognitive architecture. MCP standardizes the interface to data and tools. The orchestration layer binds these together.

MCP in Consumer Applications

OpenAI’s integration of MCP through its Apps SDK focuses on the consumer marketplace experience. While MCP standardizes data and tool access, OpenAI extends the concept to include UI rendering, creating a more integrated user experience.

The Apps SDK allows MCP servers to return Resources with a special MIME type: text/html+skybridge. When ChatGPT receives a Resource of this type, it renders the content as a sandboxed HTML/JavaScript widget within the chat interface. The widget communicates with ChatGPT through a window.openai JavaScript API. When a tool executes, results get injected into the widget via window.openai.toolOutput, creating a seamless loop where conversational interaction drives UI updates.

OpenAI defines three distinct types of state for managing this interaction model. Business state — the authoritative data like database records — lives on the MCP server. UI state — ephemeral choices like which tab is selected or sort order — lives in the widget instance and resets if the user regenerates the response. Cross-session state — user preferences that persist across conversations — requires developers to implement their own backend storage and handle OAuth-based authentication.

This architecture enables unique features like instant checkout for commercial applications. An MCP tool can return a checkout_session payload containing line items and tax calculations. The widget calls requestCheckout(), triggering ChatGPT’s native payment flow using stored user credentials. Upon completion, ChatGPT invokes a complete_checkout tool on the developer’s MCP server to finalize the order. This pattern requires careful idempotent design to prevent double-charging if the tool gets called multiple times.

To ensure ChatGPT correctly identifies when to invoke specific MCP apps, OpenAI prescribes a metadata optimization process. Developers create a “golden prompt set” containing direct intents (explicit requests), indirect intents (implied needs), and negative examples (irrelevant queries). Running these prompts in Developer Mode calculates recall (did it trigger when it should?) and precision (did it avoid triggering when it shouldn’t?). OpenAI supports specific annotations like readOnlyHint for search tools and destructiveHint for operations that modify state, controlling whether ChatGPT requests explicit user confirmation before execution.

Adopting MCP for Connected AI Applications

The Model Context Protocol has fundamentally altered how we build AI applications. By standardizing the interface between models and data, it enables the transition from isolated language models to connected agents capable of navigating complex systems. The establishment of the Agentic AI Foundation ensures this standard remains neutral and vendor-agnostic, preventing the ecosystem fragmentation that has hampered previous integration standards.

As the protocol evolves toward asynchronous operations and enhanced server identity verification, MCP is positioning itself as foundational infrastructure for agentic AI — the layer that allows digital intelligence to sense, reason, and act upon the world. The architectural patterns emerging around code execution, progressive context loading, and algorithmic optimization suggest we’re still in the early phases of understanding what becomes possible when intelligence and context truly connect.

Building agentic AI requires more than just connecting models to data. You need visibility into how those connections perform in production, the ability to optimize tool definitions based on real usage patterns, and workflows that let developers iterate on prompts while maintaining version control. Opik provides this LLM observability and optimization layer for MCP-based agents, with a dedicated MCP server that integrates directly into your development environment and algorithmic tools that refine agent behavior based on traced execution patterns. Explore Opik’s MCP integration to see how observability becomes a first-class primitive in your agentic architecture, or learn how tool optimization can systematically improve your agents’ reliability and efficiency.

Model Context Protocol: How AI Agents Connect to Your Data

The Multiplying Cost of Custom Integrations

What is MCP?

The Three Primitives

How the Protocol Works

MCP in the Context Engineering Landscape

Building With MCP: The Developer Experience

Python and FastMCP

TypeScript SDK

Writing Tool Descriptions That Models Actually Understand

MCP Debugging and Development Workflow

Constraints and Security

Handling MCP-Induced Context Bloat

Hybrid Context Management Strategies

MCP Security Considerations

MCP Implementation Across Different Contexts

Developer Tooling: Observability as a First-Class Primitive

Enterprise Deployment Patterns

MCP in Consumer Applications

Adopting MCP for Connected AI Applications