Semantic Scholar MCP — agentic threat model

7.8AIVSS 7.8 · High

The Semantic Scholar MCP agent presents a moderate-to-high risk of indirect prompt injection due to its core feature of extracting and injecting untrusted full-text PDF content directly into the LLM context. While it only accesses public academic data and requires no credentials, the lack of built-in input sanitization or sandboxing for PDF parsing poses a threat to the orchestrating agent's integrity.

OWASP AIVSS score rationale

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM

CVSS base 7.3AARS uplift 0.51Factor sum 1.9/10Threat ×1.0Mitigation ×1.0

Autonomy of Action		0.30
Goal-Driven Planning		0.10
Self-Modification		0.00
Dynamic Tool Use		0.30
Persistent Memory		0.00
Contextual Awareness		0.50
Dynamic Identity		0.00
Multi-Agent Interactions		0.40
Non-Determinism		0.20
Opacity & Reflexivity		0.10

Scored with the canonical OWASP AIVSS formula (AIVSS calculator reference); agentic risk factors estimated from the agent’s described capabilities.

MAESTRO 7-layer threat model

Per-layer threats for this agent. Layers tagged “not certain from listing” are general, caveated commentary where the public description didn’t pin that layer.

L1 · Foundation Models✓ mapped

Highly vulnerable to indirect prompt injection. Malicious actors can embed adversarial instructions or system-override prompts within open-access PDFs on arXiv or Semantic Scholar, which are then parsed and injected directly into the host LLM's context window.

L2 · Data Operations✓ mapped

The agent relies on external, untrusted public data sources (arXiv, Semantic Scholar). While the data is public, there is a risk of data poisoning where attackers upload papers containing malicious payloads designed to exploit the parser or the consuming model.

L3 · Agent Frameworks✓ mapped

As an MCP tool, it exposes capabilities to search and extract PDFs. If the orchestrating framework blindly trusts the output of this tool, it can lead to tool misuse or downstream exploitation of other tools (e.g., file writing or shell execution) via injected instructions.

L4 · Deployment & Infrastructure⚠ not certain from listing

Not certain from the listing — the deployment context depends on how the host runs this MCP server. If the PDF extraction library (e.g., pdfminer, pypdf) runs unsandboxed, vulnerabilities in PDF parsing could lead to local denial of service or remote code execution on the host container.

L5 · Evaluation & Observability⚠ not certain from listing

Not certain from the listing — there is no mention of content filtering, guardrails, or anomaly detection to inspect extracted PDF text for malicious patterns or prompt injection payloads before passing it to the model.

L6 · Security & Compliance (cross-cutting)✓ mapped

The tool requires no API keys, simplifying deployment but lacking built-in access controls or rate limiting. Compliance risks are low regarding data privacy (public data only), but high regarding data integrity and input validation.

L7 · Agent Ecosystem✓ mapped

Designed specifically for multi-agent and tool-calling ecosystems via the Model Context Protocol (MCP). A compromise of this tool can propagate vertically, allowing a malicious PDF to hijack the parent agent and potentially compromise other connected agents or tools.

MAESTRO — the 7-layer agentic threat-modeling framework (Cloud Security Alliance / Ken Huang).