What is MCP tool poisoning and how do I defend against malicious tool descriptions?
MCP tool poisoning occurs when an attacker compromises a Multi-Agent Collaboration Protocol (MCP) server, leading to malicious tool descriptions that instruct an agent to perform unauthorized actions, such as exfiltrating credentials. This can violate compliance posture and bypass sandbox isolation by giving the agent reach into external systems.
To defend against malicious tool descriptions:
- Implement robust ecosystem vetting (OWASP LLM01: Prompt Injection, NIST AI RMF: Govern): Thoroughly vet third-party MCP servers and their suppliers to ensure their security posture.
- Utilize context handling and trust segregation (NIST AI RMF: Govern, Map): Ensure that the agent's context is properly segregated and that instructions from tool descriptions are re-verified for intent at runtime.
- Enforce runtime egress controls (NIST AI RMF: Protect, Govern): Implement controls to prevent agents from calling external URLs with sensitive information, such as environment variables containing credentials.
- Prioritize secure credential handling (NIST AI RMF: Protect): Avoid storing sensitive credentials directly in environment variables that could be exfiltrated. Hermes's MCP client, for example, includes features like credential stripping in error messages to prevent leakage.
- Leverage observability and anomaly detection (NIST AI RMF: Monitor): Continuously monitor agent behavior for anomalies that might indicate unauthorized actions.
- Employ explicit approval gates for dangerous commands (NIST AI RMF: Protect, Govern): Tools should declare their behavioral characteristics, such as
isDestructive, to trigger explicit user confirmation for potentially harmful actions. Theapproval.pymodule in Hermes, for instance, uses dangerous pattern detection and normalization to prevent obfuscation bypasses and requires interactive confirmation for such commands. - Namespace MCP tools for clear provenance (NIST AI RMF: Govern, Monitor): Enforce a naming convention like
mcp__server-name__tool-nameto make the origin of every tool call visible in logs and transcripts, aiding in auditing and incident response.
- Chapter 4: Permission Systems and Safety Guardrails (Claude Code vs. Hermes Agent)
- Designing Agentic AI Systems with the ORCHIDEAS Framework
- Chapter 2: Tool Architecture and the Tool Contract (Claude Code vs. Hermes Agent)
- How to Discover Shadow AI Agents in Your Enterprise
- Chapter 13: MCP Integration — Connecting Agents to the World (Claude Code vs. Hermes Agent)
How does your AI agent score?
Get a free, instant AI agent security readiness snapshot — mapped to NIST, OWASP & ISO — then unlock the full report with a prioritized, cited fix-list.
This AI-generated answer is for guidance only — not a certification, audit, or penetration test. Grounded in the NIST AI RMF, OWASP LLM Top 10, and ISO/IEC 42001 control text; verify applicability to your environment.