What is OWASP LLM01 prompt injection and how do I prevent it?

Question

Accepted Answer

OWASP LLM01 Prompt Injection occurs when attacker-controlled input, directly or indirectly, manipulates a language model into unintended actions or disclosures. This risk is often critical for customer-facing agents and maps to NIST-MEASURE-2.7.

To prevent OWASP LLM01 Prompt Injection, implement the following controls:

Input/Instruction Separation: Clearly separate user input from system instructions to prevent malicious input from being interpreted as commands.
Trust Boundaries on Retrieved/Tool Content: Establish clear trust boundaries for content retrieved from external sources or generated by tools, as these can be vectors for injection. A robust context security architecture should maintain a provenance graph for every context element, allowing any token to be traced to its source.
Output Handling that Assumes the Model Can Be Steered: Design output handling mechanisms with the assumption that the model's output might be influenced by an attacker, and validate or sanitize it accordingly.
Least-Privilege Tool Access: Grant tools the minimum necessary permissions to perform their functions, limiting the potential damage if a tool is misused due to prompt injection.
Human-in-the-Loop for High-Impact Actions: Implement human review or approval for actions that have significant consequences. This can involve explicit human approval or a second model invocation with adversarial framing to challenge proposed actions.
Adversarial Testing: Regularly conduct adversarial testing to identify and address prompt injection vulnerabilities. Frameworks like LAAF demonstrate that agentic LLM systems are highly susceptible to persistent, logic-layer attacks, and static defenses are often insufficient.

What is OWASP LLM01 prompt injection and how do I prevent it?

How does your AI agent score?

Related questions