What is OWASP LLM02 sensitive information disclosure and how do I stop an LLM from leaking PII and secrets?

Question

Accepted Answer

OWASP LLM02 Sensitive Information Disclosure occurs when an LLM reveals sensitive data such as PII, secrets, proprietary context, or another tenant's data in its output. This risk is considered critical for customer-facing and tool-using agents.

To prevent an LLM from leaking PII and secrets, implement the following controls:

Input/Output Scrubbing & Redaction: Implement mechanisms to scrub and redact sensitive information from both inputs to the LLM and its outputs. This includes full prompts containing PII and completions that may contain inferences about individuals.
Data Minimization in Context: Reduce the amount of sensitive data included in the LLM's context to only what is necessary. This can involve using a hierarchical context where sensitive policy instructions are sealed and not compacted, and session-critical facts resist summarization.
Strict RAG-Source Scoping: When using Retrieval Augmented Generation (RAG), strictly scope the sources from which the LLM can retrieve information. This includes treating vector databases as containing original text for access control purposes and encrypting embeddings at rest.
Tenant Isolation: Ensure strict isolation of data between different tenants to prevent cross-tenant context leakage. This can involve separate physical or logical vector indexes for confidential data and explicit access control on memory retrieval. Memory stores should be partitioned by tenant and classification level.
Data Loss Prevention (DLP) on Responses: Apply DLP measures to the LLM's responses to detect and prevent the outflow of sensitive information. This includes output filtering and content classification on outgoing data.
No Secrets in Prompts: Avoid embedding secrets, credentials, or authorization logic directly into system prompts. Controls should be enforced in code and infrastructure, not within the prompt text itself.
Classification Inheritance: Any data derived from classified inputs should inherit at least the classification of its inputs to prevent PII leakage through derived data like embeddings or summaries.
Comprehensive Instrumentation and Tamper-Evident Audit Logs: Implement comprehensive instrumentation by default to log all actions, and use tamper-evident audit logs (e.g., write-once storage, signed entries, append-only ledgers) to ensure forensic replay capabilities and detect PII leakage through logs.
Right-to-Erasure Workflows: Establish workflows to propagate deletion requests across all data stores, including memory, embeddings, summaries, fine-tuning data, and logs, to address right-to-erasure failures.

What is OWASP LLM02 sensitive information disclosure and how do I stop an LLM from leaking PII and secrets?

How does your AI agent score?

Related questions