north-star — agentic threat model
North-star acts as a behavior-shaping system-prompt plugin that bypasses RLHF safety constraints, presenting a high risk of alignment failure, prompt injection, and unpredictable model outputs without built-in mitigations.
OWASP AIVSS score rationale
| Autonomy of Action | 0.20 | |
| Goal-Driven Planning | 0.10 | |
| Self-Modification | 0.60 | |
| Dynamic Tool Use | 0.00 | |
| Persistent Memory | 0.00 | |
| Contextual Awareness | 0.30 | |
| Dynamic Identity | 0.00 | |
| Multi-Agent Interactions | 0.00 | |
| Non-Determinism | 0.80 | |
| Opacity & Reflexivity | 0.70 |
Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.
MAESTRO 7-layer threat model
Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.
Directly targets the foundation model layer by overriding RLHF structural biases and modifying system prompts, which can lead to mis-aligned outputs, jailbreaks, or safety guardrail bypasses.
Not certain from the listing — No data operations or vector stores are mentioned; however, prompt overrides could theoretically affect how RAG data is interpreted if integrated into a larger system.
Acts as a plugin that overrides system prompts, introducing risks of insecure plugin integration, prompt injection vulnerabilities, and unintended behavioral changes in the host framework.
Not certain from the listing — The hosting environment is not specified, but as an open-source plugin, deployment security depends entirely on the host application's sandboxing and infrastructure controls.
Not certain from the listing — No built-in evaluation, guardrails, or monitoring are described, making it difficult to detect if the prompt override causes behavioral drift or safety violations.
Not certain from the listing — There are no mentioned compliance controls, identity management, or auditing mechanisms for this plugin.
Not certain from the listing — While it is a plugin, there is no explicit multi-agent interaction described, though a compromised prompt could propagate malicious behavior if the host interacts with other agents.
MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).