How do I vet third-party knowledge sources and datasets before adding them to RAG?

Question

Accepted Answer

To vet third-party knowledge sources and datasets for RAG, establish policies for supply-chain risk, document the system's context, and identify potential impacts. Establish a Third-Party/Supply-Chain Risk Policy (NIST-GOVERN-6.1): Develop policies that specifically address risks associated with third-party models, datasets, and tools. This policy should track provenance, licensing, and model-update risks, directly addressing OWASP LLM03/LLM05 (supply chain). Document System Context and Intended Purpose (NIST-MAP-1.1): Clearly document the intended purpose, deployment setting, and operating context of the AI/agent system, including data sensitivity. Maintain an AI System Inventory (NIST-MAP-1.5): Keep a current inventory of all AI/agent systems, including models, agents, tools, and data flows. This ensures that all components, including third-party datasets, are known and can be governed. Identify Impact and Harm (NIST-MAP-5.1): Identify potential positive and negative impacts to individuals, groups, and society, paying close attention to data-sensitivity and regulated-data exposure associated with the third-party sources.

How do I vet third-party knowledge sources and datasets before adding them to RAG?

How does your AI agent score?

Related questions