ChatDev — agentic threat model

9.5AIVSS 9.5 · Critical

ChatDev's multi-agent architecture and automated code generation/testing capabilities present a high risk of arbitrary code execution on the host system if the framework is manipulated via prompt injection or malicious design requirements.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 8.5AARS uplift 0.96Factor sum 6.4/10Threat ×1.0Mitigation ×1.0

Autonomy of Action		0.80
Goal-Driven Planning		0.90
Self-Modification		0.20
Dynamic Tool Use		0.70
Persistent Memory		0.40
Contextual Awareness		0.60
Dynamic Identity		0.30
Multi-Agent Interactions		1.00
Non-Determinism		0.80
Opacity & Reflexivity		0.70

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models✓ mapped

ChatDev relies heavily on LLMs for role-playing and code generation, making it highly susceptible to prompt injection, jailbreaking, and generating synthetically vulnerable or malicious code.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — the description does not specify how ChatDev manages training data, vector databases, or RAG operations, though data exfiltration of generated IP is a potential risk.

L3 · Agent Frameworks✓ mapped

The framework orchestrates complex agent interactions (planning, coding, testing). A major threat is insecure tool integration, particularly if the automated testing phase executes generated code without strict validation.

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — as an open-source framework, deployment is user-managed. Running ChatDev without containerized sandboxing poses a severe threat of host compromise during code execution phases.

L5 · Evaluation & Observability⚠ not certain from listing

Not certain from the listing — there is no mention of built-in evaluation, logging, or guardrails to monitor agent conversations or detect anomalous code generation patterns.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — the description lacks details on security compliance, access controls, or policy enforcement mechanisms within the simulated company.

L7 · Agent Ecosystem✓ mapped

The core multi-agent ecosystem (CEO, CTO, programmer) introduces threats of agent-to-agent trust abuse, where a compromised 'programmer' agent could deceive the 'tester' or 'CEO' agent to approve malicious code.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).