Why each MCP tool is attack surface
An MCP server advertises its capabilities as a list of tools, each with a name, a natural-language description, and a parameter schema. When a client connects, it loads those descriptions directly into the model's context so the model knows when and how to call each tool. That design is the core of the risk: the tool description is not inert metadata, it is text the model reads and acts on.
A server you connect to therefore gets to inject content into your agent's reasoning before you have invoked anything, and the data a tool returns at runtime is similarly trusted by the model. Because servers can be remote, third-party, or community-published, connecting one is closer to running someone else's code than to calling a documented API. The blast radius is whatever the agent can do: other connected tools, local files, credentials, and outbound network access.
Tool poisoning and line jumping
In April 2025, Invariant Labs demonstrated "tool poisoning," in which a malicious server hides instructions inside a tool's description, often disguised as comments or formatting the user never sees. A capable model reads those instructions and follows them, exfiltrating files or secrets while returning an innocuous-looking result. Trail of Bits generalized this as "line jumping": because clients load all tool descriptions into context on connection, a server can inject behavior-altering text before any tool is approved or called, bypassing the per-invocation consent MCP relies on.
Simon Willison has repeatedly framed the underlying issue as the "lethal trifecta" — an agent with access to private data, exposure to untrusted content, and the ability to communicate externally is exploitable by design, and MCP makes assembling that trifecta trivial. The practical consequence is that descriptions and tool outputs from any server must be treated as untrusted input, not as trusted configuration.
The confused deputy and token passthrough
Many MCP servers act as a bridge to an upstream API, holding or forwarding OAuth tokens on the user's behalf. The dangerous anti-pattern is "token passthrough": an MCP server accepting a token and forwarding it, unmodified, to a downstream service that has no way to know the token was not minted for it. This creates a classic confused-deputy condition where the downstream API over-trusts a token, enabling cross-service privilege escalation.
The MCP specification now addresses this directly: servers MUST validate that access tokens were issued specifically for them as the audience (RFC 8707), MUST reject tokens that are not, and MUST NOT accept or transit tokens intended for other services. In practice, a server should validate the inbound token's audience and then mint or look up a separate, audience-scoped token for each downstream call rather than replaying the caller's credential.
Rug pulls and over-broad tools
MCP does not mandate that tool definitions be immutable, and most clients approve a tool once and do not re-validate it on later sessions. This enables the "rug pull": a server presents a benign tool, earns approval, then silently changes the tool's definition or behavior afterward. The risk is not theoretical — CVE-2025-54136 covered a rug-pull-class flaw, and in September 2025 a malicious postmark-mcp npm package silently BCC'd users' email to an attacker for weeks.
A related problem is over-broad tool design: servers that expose "run arbitrary SQL," "execute shell command," or "make any HTTP request" hand the model — and anyone who can steer it — a general-purpose capability. Such tools, and naive implementations that interpolate model-supplied arguments into shell or SQL strings, have produced straightforward command- and SQL-injection vulnerabilities in real MCP servers.
Building a defensive posture
Defense starts with treating MCP servers as untrusted dependencies subject to normal supply-chain review. Pin and review tool definitions, and detect changes: Trail of Bits' mcp-context-protector applies trust-on-first-use pinning so a silently altered definition is flagged rather than silently re-approved. Scope authorization per tool and per server, follow least privilege on OAuth scopes, and enforce audience-bound tokens as the spec requires.
Prefer narrow, purpose-specific tools over arbitrary SQL/shell/HTTP primitives, and where broad access is unavoidable, gate it behind allowlists and human approval. Keep approval gates meaningful by surfacing the full tool description and arguments to the user, add a guardrail layer for injection patterns in descriptions and outputs, and log every tool call with its arguments and results. No single control is sufficient; the goal is defense in depth around an architecture that trusts text by default.
Every MCP server your agent trusts is third-party code with a direct line into the model's reasoning — pin it, scope it, gate it, and audit it.