north-star — agentic threat model

7.2AIVSS 7.2 · High

North-star acts as a behavior-shaping system-prompt plugin that bypasses RLHF safety constraints, presenting a high risk of alignment failure, prompt injection, and unpredictable model outputs without built-in mitigations.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 6.1AARS uplift 1.05Factor sum 2.7/10Threat ×1.0Mitigation ×1.0

Autonomy of Action		0.20
Goal-Driven Planning		0.10
Self-Modification		0.60
Dynamic Tool Use		0.00
Persistent Memory		0.00
Contextual Awareness		0.30
Dynamic Identity		0.00
Multi-Agent Interactions		0.00
Non-Determinism		0.80
Opacity & Reflexivity		0.70

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models✓ mapped

Directly targets the foundation model layer by overriding RLHF structural biases and modifying system prompts, which can lead to mis-aligned outputs, jailbreaks, or safety guardrail bypasses.

L2 · Data Operations⚠ not certain from listing

Not certain from the listing — No data operations or vector stores are mentioned; however, prompt overrides could theoretically affect how RAG data is interpreted if integrated into a larger system.

L3 · Agent Frameworks✓ mapped

Acts as a plugin that overrides system prompts, introducing risks of insecure plugin integration, prompt injection vulnerabilities, and unintended behavioral changes in the host framework.

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — The hosting environment is not specified, but as an open-source plugin, deployment security depends entirely on the host application's sandboxing and infrastructure controls.

L5 · Evaluation & Observability⚠ not certain from listing

Not certain from the listing — No built-in evaluation, guardrails, or monitoring are described, making it difficult to detect if the prompt override causes behavioral drift or safety violations.

L6 · Security & Compliance (cross-cutting)⚠ not certain from listing

Not certain from the listing — There are no mentioned compliance controls, identity management, or auditing mechanisms for this plugin.

L7 · Agent Ecosystem⚠ not certain from listing

Not certain from the listing — While it is a plugin, there is no explicit multi-agent interaction described, though a compromised prompt could propagate malicious behavior if the host interacts with other agents.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).