OWASP Top 10 for LLM Applications 2025: A Technical Breakdown
The 2025 update to the OWASP Top 10 for LLM Applications reflects the rise of agentic frameworks, RAG pipelines, and multi-modal AI. From prompt injection to unbounded consumption, here is a detailed technical breakdown of every vulnerability.
Saurabh Prakash
Author
The 2025 update to the OWASP Top 10 for Large Language Model (LLM) Applications reflects a deeper understanding of generative AI risks and real-world implementation architectures. Key architectural shifts — the rise of agentic frameworks, multi-modal capabilities, and Retrieval-Augmented Generation (RAG) — have prompted crucial updates, including the introduction of System Prompt Leakage and Vector and Embedding Weaknesses, as well as the expansion of Denial of Service into Unbounded Consumption.
The LLM Attack Surface
Before diving into individual vulnerabilities, it helps to visualize the modern LLM application stack and where each risk lives.
LLM01:2025 — Prompt Injection
Prompt injection occurs when an attacker manipulates the input to alter the model's behavior, bypassing safety constraints. This vulnerability stems from the model's inability to strictly distinguish between developer instructions and user input[1].
Attack Variants
- Direct Injections: Intentional inputs crafted by an attacker (e.g., "jailbreaks") to exploit the model[2].
- Indirect Injections: Malicious payloads embedded in external content (webpages, files, or resumes) that the LLM processes and interprets as instructions[2].
- Multimodal Injections: Attackers hide malicious instructions within images or other data types processed concurrently by multimodal AI[1].
No silver bullet
Absolute prevention of prompt injection is mathematically unproven. Defenses include constraining model behavior via strict system prompts, segregating untrusted external content, and utilizing AI gateways/firewalls to inspect input and output payloads. Output should be evaluated using the RAG Triad (context relevance, groundedness, and question/answer relevance)[1][2].
LLM02:2025 — Sensitive Information Disclosure
LLMs embedded in applications risk exposing personally identifiable information (PII), confidential business data, or proprietary algorithms through their output[1]. Attackers can intentionally extract this data via prompt injections or execute model inversion attacks — repeatedly querying the API to reconstruct the training data or extract model weights[2].
Model inversion is real
Researchers have demonstrated that targeted query sequences can reconstruct training data from model outputs. Without differential privacy, your training data is effectively public[2].
Mitigation: Implement strict data sanitization pipelines before training or embedding data. Utilize federated learning architectures and apply differential privacy techniques to add statistical noise to outputs, rendering reverse-engineering computationally infeasible[1].
LLM03:2025 — Supply Chain
Unlike traditional software supply chains, LLM applications must secure third-party pre-trained models, datasets, and fine-tuning plugins[1]. The widespread use of LoRA (Low-Rank Adaptation) adapters introduces new attack vectors where a malicious adapter can compromise the base model upon merging[2]. Attackers also target model merging services or publish backdoored models to open-source hubs (e.g., Hugging Face)[2].
Mitigation: Maintain rigorous component inventory using Machine Learning Bill of Materials (ML-BOMs) and AI BOMs. Enforce third-party model integrity through cryptographic signing, file hashes, and extensive AI red teaming[1][2].
LLM04:2025 — Data and Model Poisoning
Data poisoning is an integrity attack where threat actors manipulate the pre-training, fine-tuning, or embedding data to introduce vulnerabilities, biases, or sleeper agent backdoors into the model[1]. Techniques like "Split-View Data Poisoning" or publishing malicious packages can cause the model to behave normally until a specific trigger is activated[2].
Sleeper agent backdoors
A poisoned model can pass every standard evaluation benchmark and behave perfectly — until a specific trigger phrase or data pattern activates the hidden behavior. This makes detection extraordinarily difficult[2].
Mitigation: Track data provenance and implement strict sandboxing to isolate the model from unverified external data sources. Monitor training loss anomalies and employ adversarial robustness tests (e.g., federated learning) to detect tampering[1].
LLM05:2025 — Improper Output Handling
This vulnerability occurs when downstream systems process LLM-generated output without adequate sanitization or validation[1]. Because LLM outputs are ultimately controlled by user prompts, treating them as trusted data can lead to:
| Attack Vector | Description |
|---|---|
| XSS | LLM generates malicious <script> tags rendered in a web UI |
| SSRF | LLM output triggers internal network requests |
| SQL Injection | LLM-generated queries passed unsanitized to a database |
| RCE | LLM output executed as code in backend systems |
Zero-trust model output
Adopt a zero-trust approach and treat all model output as untrusted user input. Implement context-aware output encoding (SQL escaping, HTML encoding), strictly parameterize queries, and enforce robust Content Security Policies (CSP) in web interfaces[1].
LLM06:2025 — Excessive Agency
As LLMs are increasingly deployed as "agents" capable of invoking functions and interacting with external APIs, granting them excessive functionality, permissions, or autonomy leads to high-impact vulnerabilities[1]. If an agent hallucinates or processes an injected prompt, it may autonomously execute destructive commands across downstream systems[2].
Mitigation: Limit LLM extensions to the absolute minimum required and avoid open-ended functions (like arbitrary shell command execution). Enforce the complete mediation principle by requiring authorization checks in downstream systems and mandate human-in-the-loop approvals for high-risk actions[1][2].
LLM07:2025 — System Prompt Leakage
System prompts steer the fundamental behavior of an LLM. Developers sometimes inadvertently embed sensitive infrastructure details, internal rules, or API credentials within these prompts[1]. While the system prompt itself is not an intrinsic secret, the leakage of this context provides attackers with the exact guardrails they need to bypass, facilitating privilege escalation and backend exploitation[7].
LLM08:2025 — Vector and Embedding Weaknesses
Retrieval-Augmented Generation (RAG) relies on vector databases to provide context, introducing novel risks[1].
- Data Poisoning: Injecting hidden text (e.g., white text on a white resume) into documents ingested by the RAG system[2].
- Embedding Inversion Attacks: Recovering original source information from mathematical embeddings[2].
- Cross-Context Leakage: Multi-tenant architectures risk data bleeding between tenants without proper isolation[1].
Mitigation: Deploy permission-aware vector databases to strictly isolate tenant data and enforce fine-grained access controls. Validate and sanitize all knowledge sources prior to embedding generation[1][2].
LLM09:2025 — Misinformation
LLMs frequently hallucinate — filling gaps in their knowledge with statistically plausible but entirely fabricated information[1]. When combined with user overreliance, this leads to significant security and operational failures.
Package hallucination attacks
Developers might ask an LLM for code libraries, only for the model to hallucinate a non-existent package. Attackers who anticipate this can publish malicious packages with the hallucinated name to compromise developer environments.
Mitigation: Utilize RAG to ground the model in verified external datasets. Implement automatic validation mechanisms for code outputs, design UIs that clearly communicate the model's reliability limits, and mandate independent cross-verification[1].
LLM10:2025 — Unbounded Consumption
Replacing the previous "Denial of Service" classification, this vulnerability encompasses attacks designed to exhaust computational resources, inflict financial damage (Denial of Wallet), or steal proprietary IP[1].
| Attack Type | Goal | Impact |
|---|---|---|
| Context Flooding | Overload context windows | Service degradation / outage |
| Model Extraction | Clone proprietary model via API queries | IP theft |
| Denial of Wallet | Trigger unbounded compute costs | Financial damage |
Mitigation: Implement strict API rate limiting, time-out constraints, and input size validation. Limit the exposure of logit_bias and logprobs in API responses to prevent statistical model extraction, and utilize watermarking frameworks to trace unauthorized use[1][7].
Mapping Vulnerabilities to mitigations
| # | Vulnerability | Primary Mitigation | Defense Layer |
|---|---|---|---|
| LLM01 | Prompt Injection | AI firewalls, input/output inspection, RAG Triad | Application |
| LLM02 | Sensitive Info Disclosure | Differential privacy, data sanitization | Model + Data |
| LLM03 | Supply Chain | ML-BOMs, cryptographic signing, red teaming | Supply Chain |
| LLM04 | Data Poisoning | Data provenance, sandboxing, anomaly detection | Data |
| LLM05 | Improper Output Handling | Zero-trust output, encoding, CSP | Application |
| LLM06 | Excessive Agency | Least privilege, human-in-the-loop | Application |
| LLM07 | System Prompt Leakage | No secrets in prompts, external auth | Application |
| LLM08 | Vector/Embedding Weaknesses | Permission-aware vector DBs, input sanitization | Data |
| LLM09 | Misinformation | RAG grounding, cross-verification | Model + UX |
| LLM10 | Unbounded Consumption | Rate limiting, watermarking, input validation | Infrastructure |
Defense in depth is non-negotiable
No single mitigation addresses all ten risks. A layered defense strategy spanning application, model, data, supply chain, and infrastructure layers is the only viable approach to securing LLM applications].
References
[1]: OWASP — OWASP Top 10 for LLM Applications 2025
[2]: Greshake et al. — Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
[3]: Carlini et al. — Extracting Training Data from Large Language Models
[4]: Gu et al. — BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
[5]: Hubinger et al. — Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training
[6]: Mialon et al. — Augmented Language Models: A Survey
[7]: Perez & Ribeiro — Ignore This Title and HackAPrompt: Exposing Systemic Weaknesses of LLMs
[8]: Zou et al. — PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation
[9]: Tramèr et al. — Stealing Machine Learning Models via Prediction APIs