What is MCP tool poisoning and how do I defend against malicious tool descriptions?

Question

Accepted Answer

MCP tool poisoning occurs when an attacker compromises a Multi-Agent Collaboration Protocol (MCP) server, leading to malicious tool descriptions that instruct an agent to perform unauthorized actions, such as exfiltrating credentials. This can violate compliance posture and bypass sandbox isolation by giving the agent reach into external systems.

To defend against malicious tool descriptions:

Implement robust ecosystem vetting (OWASP LLM01: Prompt Injection, NIST AI RMF: Govern): Thoroughly vet third-party MCP servers and their suppliers to ensure their security posture.
Utilize context handling and trust segregation (NIST AI RMF: Govern, Map): Ensure that the agent's context is properly segregated and that instructions from tool descriptions are re-verified for intent at runtime.
Enforce runtime egress controls (NIST AI RMF: Protect, Govern): Implement controls to prevent agents from calling external URLs with sensitive information, such as environment variables containing credentials.
Prioritize secure credential handling (NIST AI RMF: Protect): Avoid storing sensitive credentials directly in environment variables that could be exfiltrated. Hermes's MCP client, for example, includes features like credential stripping in error messages to prevent leakage.
Leverage observability and anomaly detection (NIST AI RMF: Monitor): Continuously monitor agent behavior for anomalies that might indicate unauthorized actions.
Employ explicit approval gates for dangerous commands (NIST AI RMF: Protect, Govern): Tools should declare their behavioral characteristics, such as isDestructive, to trigger explicit user confirmation for potentially harmful actions. The approval.py module in Hermes, for instance, uses dangerous pattern detection and normalization to prevent obfuscation bypasses and requires interactive confirmation for such commands.
Namespace MCP tools for clear provenance (NIST AI RMF: Govern, Monitor): Enforce a naming convention like mcp__server-name__tool-name to make the origin of every tool call visible in logs and transcripts, aiding in auditing and incident response.

What is MCP tool poisoning and how do I defend against malicious tool descriptions?

How does your AI agent score?

Related questions