Claude 3.5 Sonnet — agentic threat model
Claude 3.5 Sonnet presents a high agentic risk profile primarily driven by its experimental 'computer use' capability, which allows it to interact directly with OS interfaces, combined with its advanced coding and planning skills. While Anthropic implements ASL-2 safety standards, the potential for prompt injection to hijack GUI-level automation poses significant security challenges.
OWASP AIVSS score rationale
| Autonomy of Action | 0.80 | |
| Goal-Driven Planning | 0.80 | |
| Self-Modification | 0.30 | |
| Dynamic Tool Use | 0.90 | |
| Persistent Memory | 0.40 | |
| Contextual Awareness | 0.90 | |
| Dynamic Identity | 0.20 | |
| Multi-Agent Interactions | 0.20 | |
| Non-Determinism | 0.70 | |
| Opacity & Reflexivity | 0.80 |
Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.
MAESTRO 7-layer threat model
Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.
As a state-of-the-art foundation model, Claude 3.5 Sonnet is highly susceptible to advanced prompt injection, jailbreaking, and adversarial attacks designed to bypass its safety alignment. Its 200K context window also increases the surface area for complex, multi-turn indirect prompt injection attacks.
Not certain from the listing — while it supports a 200K context window and Q&A in large knowledge bases, the specific data operations, vector stores, or RAG pipelines depend entirely on the customer's implementation.
The model's 'computer use' API and 'Artifacts' collaborative workspace represent highly sensitive tool integrations. Insecure orchestration or prompt injection could lead to unauthorized tool execution, such as malicious browser automation, arbitrary keystrokes, or unauthorized code execution within the workspace.
Not certain from the listing — hosting is via Anthropic's API, cloud platforms, or Claude.ai, but the specific sandboxing and isolation mechanisms for the experimental 'computer use' API are not detailed here.
Anthropic incorporates an ASL-2 safety level with rigorous ethical evaluations and low hallucination rates, indicating a strong focus on alignment and evaluation, though real-time monitoring of 'computer use' actions remains a critical operational challenge.
Not certain from the listing — mentions ASL-2 safety level and ethical evaluations, but does not explicitly detail enterprise compliance certifications (like SOC2 or ISO) or identity/access management policies.
Not certain from the listing — while it supports collaborative workspaces (Artifacts) and computer automation, there is no explicit mention of multi-agent orchestration or marketplace interactions.
MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).