playwright-skill — agentic threat model

9.3AIVSS 9.3 · Critical

The playwright-skill agent presents a high-risk profile due to its ability to autonomously generate and execute arbitrary browser automation scripts within its environment, creating significant vectors for remote code execution, SSRF, and data exfiltration if prompt injection occurs.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 8.5AARS uplift 0.81Factor sum 4.9/10Threat ×1.1Mitigation ×1.0

Autonomy of Action		0.80
Goal-Driven Planning		0.70
Self-Modification		0.20
Dynamic Tool Use		0.90
Persistent Memory		0.10
Contextual Awareness		0.50
Dynamic Identity		0.30
Multi-Agent Interactions		0.10
Non-Determinism		0.80
Opacity & Reflexivity		0.50

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models✓ mapped

Uses Claude to autonomously write scripts. Highly vulnerable to prompt injection attacks that could hijack the code generation process to produce malicious Playwright scripts.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — No details on RAG or vector stores are provided. However, parsing untrusted web page content during browser execution poses a high risk of indirect prompt injection or data poisoning.

L3 · Agent Frameworks✓ mapped

The framework orchestrates the generation and immediate execution of custom code. This represents a severe tool misuse risk, as the agent has the capability to execute arbitrary network requests and browser interactions.

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — The description states scripts run in the 'agent's environment' but does not specify sandboxing. If the environment lacks strict container isolation, there is a critical risk of host compromise, lateral movement, and SSRF.

L5 · Evaluation & Observability⚠ not certain from listing

Not certain from the listing — No logging, guardrails, or execution monitoring are mentioned. This creates a major blind spot where malicious or runaway browser automation scripts could run undetected.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — There are no details regarding authentication, authorization, or compliance policies governing what domains the agent is permitted to interact with.

L7 · Agent Ecosystem⚠ not certain from listing

Not certain from the listing — No multi-agent or marketplace interactions are described, though if integrated into a larger ecosystem, compromised scripts could be used to attack adjacent agents.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).