What is the right way to sandbox an AI agent that executes code or shell commands?

Question

Accepted Answer

Sandboxing an AI agent that executes code or shell commands requires a multi-layered approach combining robust isolation with granular access controls and continuous monitoring. While sandboxes contain the process, they do not inherently contain the effects of actions performed through external tools or MCP servers.

Here are concrete controls for sandboxing AI agents:

Isolate execution environments: Use containers with strict resource limits, ephemeral filesystems, and no access to the agent's credentials for code execution. This aligns with the NIST AI RMF function of Protect by limiting the blast radius of a misbehaving agent.
Implement Intent-Based Access Control (IBAC): Utilize a unified IBAC layer, such as agentctl, to normalize heterogeneous tool calls (e.g., Read, code_read, shell) into canonical actions (e.g., read, write, execute). This allows for uniform policy enforcement across different agent runtimes and helps mitigate OWASP LLM Top 10 risk LLM04: Insecure Output Handling by ensuring actions are authorized.
Enforce granular policies: Define Cedar policies that specify what actions (e.g., read, write, execute) a principal (agent/user) can perform on a resource (file/directory/command) under specific environmental conditions (e.g., dev, prod, time of day, IP range). This enables fine-grained control and supports the NIST AI RMF function of Govern by establishing clear rules for agent behavior.
Control outbound network access: Restrict outbound network access from sandboxed environments, allowing it only through a tool broker. This broker should validate each tool call against the agent's identity, active intent, and policy, addressing OWASP LLM Top 10 risk LLM07: Insecure Plugin Design.
Monitor and enumerate continuously: Deploy lightweight agents at the host and container levels to continuously inventory AI-relevant artifacts, running processes, loaded MCP servers, active LLM connections, and configured tool sets. This continuous discovery, especially at container startup, is crucial for maintaining current-state awareness and identifying "shadow AI agents".
Establish a runtime enforcement architecture: Place an LLM gateway in front of every model invocation and a tool broker at the tool invocation layer to enforce authentication, apply content policies, perform PII detection, and validate tool calls against policies and intent. This provides complete mediation and defense in depth, acting as the last line of defense against emergent threats.

What is the right way to sandbox an AI agent that executes code or shell commands?

How does your AI agent score?

Related questions