Home · AI Security Answers · RAG & data security
How do I vet third-party knowledge sources and datasets before adding them to RAG?
To vet third-party knowledge sources and datasets for RAG, establish policies for supply-chain risk, document the system's context, and identify potential impacts.
- Establish a Third-Party/Supply-Chain Risk Policy (NIST-GOVERN-6.1): Develop policies that specifically address risks associated with third-party models, datasets, and tools. This policy should track provenance, licensing, and model-update risks, directly addressing OWASP LLM03/LLM05 (supply chain).
- Document System Context and Intended Purpose (NIST-MAP-1.1): Clearly document the intended purpose, deployment setting, and operating context of the AI/agent system, including data sensitivity.
- Maintain an AI System Inventory (NIST-MAP-1.5): Keep a current inventory of all AI/agent systems, including models, agents, tools, and data flows. This ensures that all components, including third-party datasets, are known and can be governed.
- Identify Impact and Harm (NIST-MAP-5.1): Identify potential positive and negative impacts to individuals, groups, and society, paying close attention to data-sensitivity and regulated-data exposure associated with the third-party sources.
Grounded in
- nist_ai_rmf
How does your AI agent score?
Get a free, instant AI agent security readiness snapshot — mapped to NIST, OWASP & ISO — then unlock the full report with a prioritized, cited fix-list.
This AI-generated answer is for guidance only — not a certification, audit, or penetration test. Grounded in the NIST AI RMF, OWASP LLM Top 10, and ISO/IEC 42001 control text; verify applicability to your environment.