Vibe Check MCP — agentic threat model

7.1AIVSS 7.1 · High

Vibe Check MCP acts as a meta-guardrail to prevent scope creep and alignment drift, but its reliance on non-deterministic LLM evaluations and multi-agent interactions introduces risks of bypass and trust abuse if the primary agent is compromised.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 5.5AARS uplift 1.57Factor sum 3.5/10Threat ×1.0Mitigation ×1.0

Autonomy of Action		0.30
Goal-Driven Planning		0.40
Self-Modification		0.10
Dynamic Tool Use		0.20
Persistent Memory		0.10
Contextual Awareness		0.60
Dynamic Identity		0.10
Multi-Agent Interactions		0.80
Non-Determinism		0.50
Opacity & Reflexivity		0.40

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models⚠ not certain from listing

Not certain from the listing — The listing does not specify the underlying LLM used for the sanity-check agent. It likely relies on external models via MCP, making it susceptible to adversarial prompt injection that could bypass the alignment check.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — No details on data storage, vector databases, or training data are provided. It operates dynamically on plans passed to it at runtime.

L3 · Agent Frameworks✓ mapped

The agent acts as an MCP tool to orchestrate planning checks. Vulnerabilities in its tool integration or framework could allow bypasses of the guardrail, or allow a compromised agent to feed it malicious payloads.

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — As an open-source MCP tool, deployment depends entirely on the host environment (e.g., Claude Desktop, local Node/Python host). No specific sandboxing or hosting details are provided.

L5 · Evaluation & Observability✓ mapped

This agent is explicitly an observability/guardrail tool designed to detect drift, scope creep, and alignment issues. However, if its own evaluation logic is gamed or bypassed, it fails to prevent cascading errors, creating a false sense of security.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — No explicit authentication, authorization, or compliance frameworks (like NIST or ISO) are mentioned in the public listing.

L7 · Agent Ecosystem✓ mapped

This agent is designed specifically for multi-agent interactions (calling a separate agent to sanity-check another agent's plan). It is highly exposed to A2A trust abuse, where a compromised target agent could feed it deceptive plans to bypass checks.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).