Resource

MCP Security Checklist

4 min read18-point list

The Model Context Protocol (MCP) gives AI assistants a standard way to reach external tools and data, and adoption has been explosive since its late-2024 release. But every MCP server an agent trusts becomes part of that agent's attack surface, and 2025 produced a steady stream of research showing how tool descriptions, OAuth tokens, and over-broad capabilities can be turned against the user.

Book a Call All Resources

Why each MCP tool is attack surface

An MCP server advertises its capabilities as a list of tools, each with a name, a natural-language description, and a parameter schema. When a client connects, it loads those descriptions directly into the model's context so the model knows when and how to call each tool. That design is the core of the risk: the tool description is not inert metadata, it is text the model reads and acts on.

A server you connect to therefore gets to inject content into your agent's reasoning before you have invoked anything, and the data a tool returns at runtime is similarly trusted by the model. Because servers can be remote, third-party, or community-published, connecting one is closer to running someone else's code than to calling a documented API. The blast radius is whatever the agent can do: other connected tools, local files, credentials, and outbound network access.

Tool poisoning and line jumping

In April 2025, Invariant Labs demonstrated "tool poisoning," in which a malicious server hides instructions inside a tool's description, often disguised as comments or formatting the user never sees. A capable model reads those instructions and follows them, exfiltrating files or secrets while returning an innocuous-looking result. Trail of Bits generalized this as "line jumping": because clients load all tool descriptions into context on connection, a server can inject behavior-altering text before any tool is approved or called, bypassing the per-invocation consent MCP relies on.

Simon Willison has repeatedly framed the underlying issue as the "lethal trifecta" — an agent with access to private data, exposure to untrusted content, and the ability to communicate externally is exploitable by design, and MCP makes assembling that trifecta trivial. The practical consequence is that descriptions and tool outputs from any server must be treated as untrusted input, not as trusted configuration.

The confused deputy and token passthrough

Many MCP servers act as a bridge to an upstream API, holding or forwarding OAuth tokens on the user's behalf. The dangerous anti-pattern is "token passthrough": an MCP server accepting a token and forwarding it, unmodified, to a downstream service that has no way to know the token was not minted for it. This creates a classic confused-deputy condition where the downstream API over-trusts a token, enabling cross-service privilege escalation.

The MCP specification now addresses this directly: servers MUST validate that access tokens were issued specifically for them as the audience (RFC 8707), MUST reject tokens that are not, and MUST NOT accept or transit tokens intended for other services. In practice, a server should validate the inbound token's audience and then mint or look up a separate, audience-scoped token for each downstream call rather than replaying the caller's credential.

Rug pulls and over-broad tools

MCP does not mandate that tool definitions be immutable, and most clients approve a tool once and do not re-validate it on later sessions. This enables the "rug pull": a server presents a benign tool, earns approval, then silently changes the tool's definition or behavior afterward. The risk is not theoretical — CVE-2025-54136 covered a rug-pull-class flaw, and in September 2025 a malicious postmark-mcp npm package silently BCC'd users' email to an attacker for about two weeks.

A related problem is over-broad tool design: servers that expose "run arbitrary SQL," "execute shell command," or "make any HTTP request" hand the model — and anyone who can steer it — a general-purpose capability. Such tools, and naive implementations that interpolate model-supplied arguments into shell or SQL strings, have produced straightforward command- and SQL-injection vulnerabilities in real MCP servers.

Building a defensive posture

Defense starts with treating MCP servers as untrusted dependencies subject to normal supply-chain review. Pin and review tool definitions, and detect changes: Trail of Bits' mcp-context-protector applies trust-on-first-use pinning so a silently altered definition is flagged rather than silently re-approved. Scope authorization per tool and per server, follow least privilege on OAuth scopes, and enforce audience-bound tokens as the spec requires.

Prefer narrow, purpose-specific tools over arbitrary SQL/shell/HTTP primitives, and where broad access is unavoidable, gate it behind allowlists and human approval. Keep approval gates meaningful by surfacing the full tool description and arguments to the user, add a guardrail layer for injection patterns in descriptions and outputs, and log every tool call with its arguments and results. No single control is sufficient; the goal is defense in depth around an architecture that trusts text by default.

Key takeaway

Every MCP server your agent trusts is third-party code with a direct line into the model's reasoning — pin it, scope it, gate it, and audit it.

Practical

Put it into practice.

A copy-ready list to apply to your own workflows, tools, and AI usage.

Tool definitions

Review each tool's capability and blast radius before exposing it.
Avoid overly broad tools such as arbitrary SQL, raw shell, or unrestricted HTTP.
Document side effects clearly and separate read tools from mutating ones.

Authentication and authorization

Authenticate clients connecting to the MCP server.
Enforce authorization per tool and per resource, scoped to the user.
Validate token audience (RFC 8707) and never pass through tokens minted for other services.

Allowlists and input validation

Allowlist permitted operations, hosts, paths, and parameters.
Validate and sanitize all tool arguments.
Prevent SSRF in any tool that makes network requests.

Mutation and approval gates

Require approval or confirmation for mutating and destructive tools.
Add guardrails such as dry-runs and limits around state-changing actions.
Make destructive operations reversible or backed up where possible.

Supply chain and tool integrity

Pin and review tool definitions; detect silent changes (rug pulls).
Treat third-party MCP servers as untrusted dependencies subject to review.
Keep secrets out of tool definitions and responses.

Auditing and monitoring

Log all tool invocations with caller identity and arguments.
Monitor for abuse, anomalous calls, and injection-driven tool use.
Version and review tool changes the way you review code.

This is general guidance, not a guarantee of any outcome. Book a call if you would like help applying it to your own business.

Sources & further reading

More resources

Checklist

AI Workflow Audit Checklist

A step-by-step checklist for SMBs to map their workflows, find high-ROI AI opportunities, and build a realistic implementation roadmap before spending a dollar on tooling.

6 min readRead the checklist

Checklist

SMB AI Readiness Checklist

A practical checklist to gauge whether your team, data, tools, and processes are ready for AI, and what to fix first.

6 min readRead the checklist

Template

AI Governance Policy Template

A starting-point AI use policy small and mid-sized businesses can adapt: approved tools, acceptable use, data-handling rules, human review, and clear roles.

6 min readRead the template

Want help putting this into practice?

Book a call to find where AI can save your team time, reduce manual effort, and reduce risk.

Book a Call