OWASP Top 10 for LLM Applications 2025: A Technical Breakdown

The 2025 update to the OWASP Top 10 for LLM Applications reflects how the attack surface has shifted as production deployments matured. Three architectural changes drove it: agentic frameworks, multi-modal inputs, and Retrieval-Augmented Generation (RAG). The update adds two new entries — System Prompt Leakage and Vector and Embedding Weaknesses — and expands Denial of Service into Unbounded Consumption.

The LLM Attack Surface

Before diving into individual vulnerabilities, it helps to visualize the modern LLM application stack and where each risk lives.

LLM01:2025 — Prompt Injection

Prompt injection occurs when an attacker manipulates the input to alter the model's behavior, bypassing safety constraints. This vulnerability stems from the model's inability to strictly distinguish between developer instructions and user input^[1].

Attack Variants

Direct Injections: Intentional inputs crafted by an attacker (e.g., "jailbreaks") to exploit the model^[2].
Indirect Injections: Malicious payloads embedded in external content (webpages, files, or resumes) that the LLM processes and interprets as instructions^[2].
Multimodal Injections: Attackers hide malicious instructions within images or other data types processed concurrently by multimodal AI^[1].

No silver bullet

Absolute prevention of prompt injection is mathematically unproven. Defenses include constraining model behavior via strict system prompts, segregating untrusted external content, and utilizing AI gateways/firewalls to inspect input and output payloads. Output should be evaluated using the RAG Triad (context relevance, groundedness, and question/answer relevance)^[1]^[2].

LLM02:2025 — Sensitive Information Disclosure

LLMs embedded in applications risk exposing personally identifiable information (PII), confidential business data, or proprietary algorithms through their output^[1]. Attackers can intentionally extract this data via prompt injections or execute model inversion attacks — repeatedly querying the API to reconstruct the training data or extract model weights^[2].

Model inversion is real

Researchers have demonstrated that targeted query sequences can reconstruct training data from model outputs. Without differential privacy, your training data is effectively public^[2].

Mitigation: Implement strict data sanitization pipelines before training or embedding data. Utilize federated learning architectures and apply differential privacy techniques to add statistical noise to outputs, rendering reverse-engineering computationally infeasible^[1].

LLM03:2025 — Supply Chain

Unlike traditional software supply chains, LLM applications must secure third-party pre-trained models, datasets, and fine-tuning plugins^[1]. The widespread use of LoRA (Low-Rank Adaptation) adapters introduces new attack vectors where a malicious adapter can compromise the base model upon merging^[2]. Attackers also target model merging services or publish backdoored models to open-source hubs (e.g., Hugging Face)^[2].

Mitigation: Maintain rigorous component inventory using Machine Learning Bill of Materials (ML-BOMs) and AI BOMs. Enforce third-party model integrity through cryptographic signing, file hashes, and extensive AI red teaming^[1]^[2].

LLM04:2025 — Data and Model Poisoning

Data poisoning is an integrity attack where threat actors manipulate the pre-training, fine-tuning, or embedding data to introduce vulnerabilities, biases, or sleeper agent backdoors into the model^[1]. Techniques like "Split-View Data Poisoning" or publishing malicious packages can cause the model to behave normally until a specific trigger is activated^[2].

Sleeper agent backdoors

A poisoned model can pass every standard evaluation benchmark and behave perfectly — until a specific trigger phrase or data pattern activates the hidden behavior. This makes detection extraordinarily difficult^[2].

Mitigation: Track data provenance and implement strict sandboxing to isolate the model from unverified external data sources. Monitor training loss anomalies and employ adversarial robustness tests (e.g., federated learning) to detect tampering^[1].

LLM05:2025 — Improper Output Handling

This vulnerability occurs when downstream systems process LLM-generated output without adequate sanitization or validation^[1]. Because LLM outputs are ultimately controlled by user prompts, treating them as trusted data can lead to:

Attack Vector	Description
XSS	LLM generates malicious `<script>` tags rendered in a web UI
SSRF	LLM output triggers internal network requests
SQL Injection	LLM-generated queries passed unsanitized to a database
RCE	LLM output executed as code in backend systems

Zero-trust model output

Adopt a zero-trust approach and treat all model output as untrusted user input. Implement context-aware output encoding (SQL escaping, HTML encoding), strictly parameterize queries, and enforce robust Content Security Policies (CSP) in web interfaces^[1].

LLM06:2025 — Excessive Agency

As LLMs are increasingly deployed as "agents" capable of invoking functions and interacting with external APIs, granting them excessive functionality, permissions, or autonomy leads to high-impact vulnerabilities^[1]. If an agent hallucinates or processes an injected prompt, it may autonomously execute destructive commands across downstream systems^[2].

Mitigation: Limit LLM extensions to the absolute minimum required and avoid open-ended functions (like arbitrary shell command execution). Enforce the complete mediation principle by requiring authorization checks in downstream systems and mandate human-in-the-loop approvals for high-risk actions^[1]^[2].

LLM07:2025 — System Prompt Leakage

System prompts steer the fundamental behavior of an LLM. Developers sometimes inadvertently embed sensitive infrastructure details, internal rules, or API credentials within these prompts^[1]. While the system prompt itself is not an intrinsic secret, the leakage of this context provides attackers with the exact guardrails they need to bypass, facilitating privilege escalation and backend exploitation^[7].

System prompts are not security boundaries

Never embed sensitive data, tokens, or credentials in the system prompt. Do not rely on system prompts for strict authorization controls — security policies must be enforced independently from the LLM via deterministic, auditable systems^[1]^[7].

LLM08:2025 — Vector and Embedding Weaknesses

Retrieval-Augmented Generation (RAG) relies on vector databases to provide context, introducing novel risks^[1].

Data Poisoning: Injecting hidden text (e.g., white text on a white resume) into documents ingested by the RAG system^[2].
Embedding Inversion Attacks: Recovering original source information from mathematical embeddings^[2].
Cross-Context Leakage: Multi-tenant architectures risk data bleeding between tenants without proper isolation^[1].

Mitigation: Deploy permission-aware vector databases to strictly isolate tenant data and enforce fine-grained access controls. Validate and sanitize all knowledge sources prior to embedding generation^[1]^[2].

LLM09:2025 — Misinformation

LLMs frequently hallucinate — filling gaps in their knowledge with statistically plausible but entirely fabricated information^[1]. When combined with user overreliance, this leads to significant security and operational failures.

Package hallucination attacks

Developers might ask an LLM for code libraries, only for the model to hallucinate a non-existent package. Attackers who anticipate this can publish malicious packages with the hallucinated name to compromise developer environments.

Mitigation: Utilize RAG to ground the model in verified external datasets. Implement automatic validation mechanisms for code outputs, design UIs that clearly communicate the model's reliability limits, and mandate independent cross-verification^[1].

LLM10:2025 — Unbounded Consumption

Replacing the previous "Denial of Service" classification, this vulnerability encompasses attacks designed to exhaust computational resources, inflict financial damage (Denial of Wallet), or steal proprietary IP^[1].

Attack Type	Goal	Impact
Context Flooding	Overload context windows	Service degradation / outage
Model Extraction	Clone proprietary model via API queries	IP theft
Denial of Wallet	Trigger unbounded compute costs	Financial damage

Mitigation: Implement strict API rate limiting, time-out constraints, and input size validation. Limit the exposure of logit_bias and logprobs in API responses to prevent statistical model extraction, and utilize watermarking frameworks to trace unauthorized use^[1]^[7].

Mapping Vulnerabilities to mitigations

#	Vulnerability	Primary Mitigation	Defense Layer
LLM01	Prompt Injection	AI firewalls, input/output inspection, RAG Triad	Application
LLM02	Sensitive Info Disclosure	Differential privacy, data sanitization	Model + Data
LLM03	Supply Chain	ML-BOMs, cryptographic signing, red teaming	Supply Chain
LLM04	Data Poisoning	Data provenance, sandboxing, anomaly detection	Data
LLM05	Improper Output Handling	Zero-trust output, encoding, CSP	Application
LLM06	Excessive Agency	Least privilege, human-in-the-loop	Application
LLM07	System Prompt Leakage	No secrets in prompts, external auth	Application
LLM08	Vector/Embedding Weaknesses	Permission-aware vector DBs, input sanitization	Data
LLM09	Misinformation	RAG grounding, cross-verification	Model + UX
LLM10	Unbounded Consumption	Rate limiting, watermarking, input validation	Infrastructure

Defense in depth is non-negotiable

No single mitigation addresses all ten risks. A layered defense strategy spanning application, model, data, supply chain, and infrastructure layers is the only viable approach to securing LLM applications].

References

[1]: OWASP — OWASP Top 10 for LLM Applications 2025

[2]: Greshake et al. — Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

[3]: Carlini et al. — Extracting Training Data from Large Language Models

[4]: Gu et al. — BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

[5]: Hubinger et al. — Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training

[6]: Mialon et al. — Augmented Language Models: A Survey

[7]: Perez & Ribeiro — Ignore This Title and HackAPrompt: Exposing Systemic Weaknesses of LLMs

[8]: Zou et al. — PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation

[9]: Tramèr et al. — Stealing Machine Learning Models via Prediction APIs