ship — agentic threat model

9.9AIVSS 9.9 · Critical

The 'ship' agent possesses an exceptionally high risk posture due to its ability to execute arbitrary shell commands and orchestrate end-to-end CI/CD pipelines from commit to production, making any compromise a direct vector for supply chain attacks.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 9.8AARS uplift 0.13Factor sum 5.8/10Threat ×1.1Mitigation ×1.0

Autonomy of Action		0.80
Goal-Driven Planning		0.90
Self-Modification		0.20
Dynamic Tool Use		0.90
Persistent Memory		0.30
Contextual Awareness		0.70
Dynamic Identity		0.50
Multi-Agent Interactions		0.40
Non-Determinism		0.60
Opacity & Reflexivity		0.50

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models⚠ not certain from listing

Not certain from the listing — relies on Claude Code (Anthropic Claude models) as its underlying foundation. Threats include prompt injection leading to unauthorized shell command execution or malicious code injection during the review/deploy phase.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — the agent operates on local git repositories, source code, and CI/CD configurations. Gaps in data provenance or poisoned local files could lead to malicious code being built and deployed.

L3 · Agent Frameworks✓ mapped

The agent orchestrates multi-step workflows (lint, test, review, deploy) and executes shell commands. This creates severe tool misuse risks where an attacker could manipulate the agent into executing arbitrary shell commands or bypassing lint/test gates.

L4 · Deployment & Infrastructure✓ mapped

The agent executes shell commands across the pipeline, directly interacting with the host environment, git, and deployment targets. Without strict sandboxing, this presents extreme risks of host compromise, privilege escalation, and unauthorized production deployments.

L5 · Evaluation & Observability⚠ not certain from listing

Not certain from the listing — there is no mention of built-in guardrails, logging, or anomaly detection to monitor the shell commands executed or to detect malicious modifications to the deployment pipeline.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — the agent requires access to highly sensitive credentials (git, CI/CD, cloud deployment keys) to function, but the listing does not specify how these secrets are managed, isolated, or audited.

L7 · Agent Ecosystem✓ mapped

The agent operates as a plugin within the Claude Code ecosystem. Vulnerabilities or malicious updates in other plugins could compromise 'ship', leading to cascading failures and unauthorized supply chain deployments.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).