Vocode — agentic threat model

7.7AIVSS 7.7 · High

Vocode's risk posture is centered on real-time voice orchestration, where vulnerabilities can lead to automated voice phishing (vishing), session hijacking, and telephony fraud. Its reliance on external LLM and STT/TTS providers introduces significant supply-chain and data-transit risks.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 7.5AARS uplift 1.05Factor sum 4.2/10Threat ×1.0Mitigation ×0.9

Autonomy of Action		0.50
Goal-Driven Planning		0.40
Self-Modification		0.10
Dynamic Tool Use		0.50
Persistent Memory		0.30
Contextual Awareness		0.60
Dynamic Identity		0.30
Multi-Agent Interactions		0.20
Non-Determinism		0.70
Opacity & Reflexivity		0.60

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models✓ mapped

Integrates directly with external LLM providers. Vulnerable to prompt injection via voice (vishing/over-the-air injection), adversarial audio inputs that bypass LLM safety filters, and mis-aligned or hallucinated voice outputs.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — Vocode orchestrates real-time audio streams. Threats include exposure of transient voice data, lack of secure logging for transcriptions, and potential data exfiltration via compromised TTS/STT endpoints.

L3 · Agent Frameworks✓ mapped

Orchestrates the critical STT -> LLM -> TTS pipeline. Vulnerabilities include state desynchronization during real-time interruptions, race conditions in conversation handling, and insecure integration with telephony APIs (e.g., Twilio).

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — As an open-source framework, deployment is developer-managed. Key threats include insecure hosting of the orchestration server, exposed WebSockets for real-time audio, and leaked API keys for LLM/TTS providers.

L5 · Evaluation & Observability⚠ not certain from listing

Not certain from the listing — No explicit mention of built-in guardrails or real-time monitoring. Gaps in logging voice interactions could lead to undetected prompt injections or abuse.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — Compliance details (such as HIPAA for voice data or GDPR for biometric/voice processing) are not specified. Telephony fraud and lack of robust access controls on voice endpoints are key risks.

L7 · Agent Ecosystem⚠ not certain from listing

Not certain from the listing — While it supports customizable agents, there is no explicit multi-agent marketplace mentioned. Risks involve untrusted third-party STT/TTS/LLM integrations.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).