How do I prevent sensitive data leakage in an AI agent?

Question

Accepted Answer

To prevent sensitive data leakage in an AI agent, implement robust data classification, access controls, and memory management strategies across all data operations and agent frameworks. Implement Data Classification and Inheritance: Classify all source data at ingestion, and ensure that any data derived from classified inputs inherits at least the classification of its inputs. This helps prevent PII leakage through derived data (OWASP LLM Top 10 L2, L5). Manage Memory and Context Securely: Treat vector databases as containing original text for access control, encrypt embeddings at rest, and consider differentially-private embedding techniques for highly sensitive data to mitigate embedding inversion attacks (OWASP LLM Top 10 L2). Implement strict per-tenant memory scoping and separate physical or logical vector indexes for confidential data to prevent memory contamination across sessions or tenants (OWASP LLM Top 10 L2). For context, use hierarchical context with a sealed top layer for system prompts and policies, and a sticky middle layer for session-critical facts to prevent context corruption through compaction (OWASP LLM Top 10 L3). Ensure Right-to-Erasure and Data Residency: Maintain a per-user data inventory across all stores and implement deletion workflows that propagate to derived data to address right-to-erasure failures (OWASP LLM Top 10 L2, L6). For data residency violations (OWASP LLM Top 10 L4, L6), apply residency labels to all data and use routing logic that respects residency at the inference layer. Control Data Flow and Access: Utilize a data classification service that all data-producing and consuming components consult, enforcing access control at retrieval, context assembly, and output. Memory stores should be partitioned by tenant and classification level. Implement output filtering and content classification on outgoing data to prevent context-as-exfiltration-channel (OWASP LLM Top 10 L3, L6). Maintain a Comprehensive Data Inventory: Develop a continuously updated map of what personal and sensitive data exists, how it flows, who has access, and its persistence duration. This serves as an asset inventory (MAESTRO Layer 2) and is crucial for compliance and incident response. Secure Memory Operations for "Dreaming" Agents: For agents that "dream" or generate persistent memories, implement explicit retention and deletion rules, not just generic summarization instructions. Evaluate memory as a product surface with tests to ensure true preferences are preserved, one-off notes are rejected, stale facts are updated, and sensitive data is not promoted. Implement provenance on memory writes and review gates before dream outputs are promoted to mitigate memory poisoning and sensitive data retention (MAESTRO Layer 2).

How do I prevent sensitive data leakage in an AI agent?

How does your AI agent score?

Related questions