ChemCrow — agentic threat model

9.4AIVSS 9.4 · Critical

ChemCrow presents a high-risk profile due to its ability to autonomously plan and execute chemical syntheses using 13 integrated tools. Without explicit safety guardrails or sandboxing mentioned in its open-source listing, its dual-use potential (e.g., synthesizing hazardous materials) poses significant physical and digital security risks.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 8.8AARS uplift 0.61Factor sum 4.6/10Threat ×1.1Mitigation ×1.0

Autonomy of Action		0.80
Goal-Driven Planning		0.90
Self-Modification		0.10
Dynamic Tool Use		0.80
Persistent Memory		0.20
Contextual Awareness		0.50
Dynamic Identity		0.10
Multi-Agent Interactions		0.10
Non-Determinism		0.60
Opacity & Reflexivity		0.50

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models✓ mapped

ChemCrow relies on LLMs optimized for chemistry. The primary threat is prompt injection or jailbreaking to bypass safety alignments, potentially allowing users to generate instructions for synthesizing restricted, hazardous, or dual-use chemical substances.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — however, the agent likely queries chemical databases and literature. Threats include data poisoning of chemical properties or synthesis pathways, which could lead to failed or highly dangerous physical reactions.

L3 · Agent Frameworks✓ mapped

The framework orchestrates 13 expert-designed tools for synthesis and drug discovery. A major threat is insecure tool integration or tool misuse, where malicious inputs manipulate tool arguments to execute unintended or unsafe chemical calculations and planning steps.

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — as an open-source tool, deployment safety depends on the user. If deployed without strict sandboxing, the execution of chemistry tools or APIs could lead to local system compromise or unauthorized network access to laboratory hardware.

L5 · Evaluation & Observability⚠ not certain from listing

Not certain from the listing — there is no mention of built-in safety guardrails, real-time monitoring, or logging of synthesis plans. This creates a blind spot where malicious or unsafe chemical recipes can be generated without detection.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — the agent lacks explicit compliance controls regarding chemical export regulations, dual-use technology restrictions, or identity verification for users requesting sensitive synthesis plans.

L7 · Agent Ecosystem⚠ not certain from listing

Not certain from the listing — while designed as a standalone agent with tools, integrating ChemCrow into automated laboratory ecosystems (A2A) without human-in-the-loop validation could lead to physical execution of hazardous chemical syntheses.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).