Claude 3.5 Sonnet — agentic threat model

8.0AIVSS 8.0 · High

Claude 3.5 Sonnet presents a high agentic risk profile primarily driven by its experimental 'computer use' capability, which allows it to interact directly with OS interfaces, combined with its advanced coding and planning skills. While Anthropic implements ASL-2 safety standards, the potential for prompt injection to hijack GUI-level automation poses significant security challenges.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 8.5AARS uplift 0.9Factor sum 6.0/10Threat ×1.0Mitigation ×0.85

Autonomy of Action		0.80
Goal-Driven Planning		0.80
Self-Modification		0.30
Dynamic Tool Use		0.90
Persistent Memory		0.40
Contextual Awareness		0.90
Dynamic Identity		0.20
Multi-Agent Interactions		0.20
Non-Determinism		0.70
Opacity & Reflexivity		0.80

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models✓ mapped

As a state-of-the-art foundation model, Claude 3.5 Sonnet is highly susceptible to advanced prompt injection, jailbreaking, and adversarial attacks designed to bypass its safety alignment. Its 200K context window also increases the surface area for complex, multi-turn indirect prompt injection attacks.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — while it supports a 200K context window and Q&A in large knowledge bases, the specific data operations, vector stores, or RAG pipelines depend entirely on the customer's implementation.

L3 · Agent Frameworks✓ mapped

The model's 'computer use' API and 'Artifacts' collaborative workspace represent highly sensitive tool integrations. Insecure orchestration or prompt injection could lead to unauthorized tool execution, such as malicious browser automation, arbitrary keystrokes, or unauthorized code execution within the workspace.

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — hosting is via Anthropic's API, cloud platforms, or Claude.ai, but the specific sandboxing and isolation mechanisms for the experimental 'computer use' API are not detailed here.

L5 · Evaluation & Observability✓ mapped

Anthropic incorporates an ASL-2 safety level with rigorous ethical evaluations and low hallucination rates, indicating a strong focus on alignment and evaluation, though real-time monitoring of 'computer use' actions remains a critical operational challenge.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — mentions ASL-2 safety level and ethical evaluations, but does not explicitly detail enterprise compliance certifications (like SOC2 or ISO) or identity/access management policies.

L7 · Agent Ecosystem⚠ not certain from listing

Not certain from the listing — while it supports collaborative workspaces (Artifacts) and computer automation, there is no explicit mention of multi-agent orchestration or marketplace interactions.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).