What is the confused deputy problem in AI agents and how do I prevent it?

Question

Accepted Answer

The confused deputy problem in AI agents occurs when an agent, acting on behalf of a user, is tricked into performing an action that exceeds its authorized permissions or deviates from the user's original intent. This can happen when an agent's reasoning is compromised, leading to actions that are not consistent with the session's purpose.

To prevent the confused deputy problem, implement the following controls:

Attenuated Delegation (OWASP LLM Top 10: L3, L7): When an agent delegates a subtask to another agent, the receiving agent should receive a capability token that is strictly narrower than the delegating agent's, scoped only to the specific subtask.
Intent Re-derivation and Attestation (OWASP LLM Top 10: L1, L3): Authorization should be re-derived from the originally attested intent rather than solely from the agent’s current reasoning, especially to counter intent drift through prompt injection. Vague tasks should fail attestation rather than being elaborated into plausible-sounding goals.
Composition-Aware Policy (OWASP LLM Top 10: L3, L7): For tool composition, implement policies that reason about effect chains rather than just individual tool permissions to prevent privilege escalation.
Secure Identity and Intent Chain (NIST AI RMF: Govern): Bind workload identity, delegated user identity, active intent token, and a per-action capability token at every consequential action. The capability token passed to downstream services should be short-lived, narrowly scoped, and embed the intent ID. Avoid anti-patterns such as omnipotent agent service accounts or passing user credentials directly to the agent; instead, use token exchange for narrower delegated credentials.
Trust Guardian for Tool Invocations (NIST AI RMF: Govern, Protect): Place a Trust Guardian in front of Model Context Protocol (MCP) tool invocations to evaluate actions against declared intent and a deterministic policy floor before execution. Implement fine-grained policies to control which agents can invoke which MCP servers.
Attenuated Capability Tokens and Audit Logs for Agent-to-Agent Handoffs (OWASP LLM Top 10: L7, L6): For agent-to-agent confused deputy scenarios, use attenuated capability tokens across handoffs and maintain audit logs at every handoff. For custom-built agents, use an SDK that wraps every tool call and agent-to-agent handoff, sending hop context to an IBAC Judge at each step. When crossing a trust boundary, issue a short-lived, cryptographically signed transaction token that carries the user's verified identity, the agent chain, and the session's original declared intent.

What is the confused deputy problem in AI agents and how do I prevent it?

How does your AI agent score?

Related questions