Skip to content
Code & Context logoCode&Context

OWASP Top 10 for LLM Applications 2025: A Technical Breakdown

The 2025 update to the OWASP Top 10 for LLM Applications reflects the rise of agentic frameworks, RAG pipelines, and multi-modal AI. From prompt injection to unbounded consumption, here is a detailed technical breakdown of every vulnerability.

Saurabh Prakash

Author

Mar 27, 202610 min read
Share:

The 2025 update to the OWASP Top 10 for Large Language Model (LLM) Applications reflects a deeper understanding of generative AI risks and real-world implementation architectures. Key architectural shifts — the rise of agentic frameworks, multi-modal capabilities, and Retrieval-Augmented Generation (RAG) — have prompted crucial updates, including the introduction of System Prompt Leakage and Vector and Embedding Weaknesses, as well as the expansion of Denial of Service into Unbounded Consumption.


The LLM Attack Surface

Before diving into individual vulnerabilities, it helps to visualize the modern LLM application stack and where each risk lives.


LLM01:2025 — Prompt Injection

Prompt injection occurs when an attacker manipulates the input to alter the model's behavior, bypassing safety constraints. This vulnerability stems from the model's inability to strictly distinguish between developer instructions and user input[1].

Attack Variants

  • Direct Injections: Intentional inputs crafted by an attacker (e.g., "jailbreaks") to exploit the model[2].
  • Indirect Injections: Malicious payloads embedded in external content (webpages, files, or resumes) that the LLM processes and interprets as instructions[2].
  • Multimodal Injections: Attackers hide malicious instructions within images or other data types processed concurrently by multimodal AI[1].

No silver bullet

Absolute prevention of prompt injection is mathematically unproven. Defenses include constraining model behavior via strict system prompts, segregating untrusted external content, and utilizing AI gateways/firewalls to inspect input and output payloads. Output should be evaluated using the RAG Triad (context relevance, groundedness, and question/answer relevance)[1][2].


LLM02:2025 — Sensitive Information Disclosure

LLMs embedded in applications risk exposing personally identifiable information (PII), confidential business data, or proprietary algorithms through their output[1]. Attackers can intentionally extract this data via prompt injections or execute model inversion attacks — repeatedly querying the API to reconstruct the training data or extract model weights[2].

Model inversion is real

Researchers have demonstrated that targeted query sequences can reconstruct training data from model outputs. Without differential privacy, your training data is effectively public[2].

Mitigation: Implement strict data sanitization pipelines before training or embedding data. Utilize federated learning architectures and apply differential privacy techniques to add statistical noise to outputs, rendering reverse-engineering computationally infeasible[1].


LLM03:2025 — Supply Chain

Unlike traditional software supply chains, LLM applications must secure third-party pre-trained models, datasets, and fine-tuning plugins[1]. The widespread use of LoRA (Low-Rank Adaptation) adapters introduces new attack vectors where a malicious adapter can compromise the base model upon merging[2]. Attackers also target model merging services or publish backdoored models to open-source hubs (e.g., Hugging Face)[2].

Mitigation: Maintain rigorous component inventory using Machine Learning Bill of Materials (ML-BOMs) and AI BOMs. Enforce third-party model integrity through cryptographic signing, file hashes, and extensive AI red teaming[1][2].


LLM04:2025 — Data and Model Poisoning

Data poisoning is an integrity attack where threat actors manipulate the pre-training, fine-tuning, or embedding data to introduce vulnerabilities, biases, or sleeper agent backdoors into the model[1]. Techniques like "Split-View Data Poisoning" or publishing malicious packages can cause the model to behave normally until a specific trigger is activated[2].

Sleeper agent backdoors

A poisoned model can pass every standard evaluation benchmark and behave perfectly — until a specific trigger phrase or data pattern activates the hidden behavior. This makes detection extraordinarily difficult[2].

Mitigation: Track data provenance and implement strict sandboxing to isolate the model from unverified external data sources. Monitor training loss anomalies and employ adversarial robustness tests (e.g., federated learning) to detect tampering[1].


LLM05:2025 — Improper Output Handling

This vulnerability occurs when downstream systems process LLM-generated output without adequate sanitization or validation[1]. Because LLM outputs are ultimately controlled by user prompts, treating them as trusted data can lead to:

Attack VectorDescription
XSSLLM generates malicious <script> tags rendered in a web UI
SSRFLLM output triggers internal network requests
SQL InjectionLLM-generated queries passed unsanitized to a database
RCELLM output executed as code in backend systems

Zero-trust model output

Adopt a zero-trust approach and treat all model output as untrusted user input. Implement context-aware output encoding (SQL escaping, HTML encoding), strictly parameterize queries, and enforce robust Content Security Policies (CSP) in web interfaces[1].


LLM06:2025 — Excessive Agency

As LLMs are increasingly deployed as "agents" capable of invoking functions and interacting with external APIs, granting them excessive functionality, permissions, or autonomy leads to high-impact vulnerabilities[1]. If an agent hallucinates or processes an injected prompt, it may autonomously execute destructive commands across downstream systems[2].

Mitigation: Limit LLM extensions to the absolute minimum required and avoid open-ended functions (like arbitrary shell command execution). Enforce the complete mediation principle by requiring authorization checks in downstream systems and mandate human-in-the-loop approvals for high-risk actions[1][2].


LLM07:2025 — System Prompt Leakage

System prompts steer the fundamental behavior of an LLM. Developers sometimes inadvertently embed sensitive infrastructure details, internal rules, or API credentials within these prompts[1]. While the system prompt itself is not an intrinsic secret, the leakage of this context provides attackers with the exact guardrails they need to bypass, facilitating privilege escalation and backend exploitation[7].

System prompts are not security boundaries

Never embed sensitive data, tokens, or credentials in the system prompt. Do not rely on system prompts for strict authorization controls — security policies must be enforced independently from the LLM via deterministic, auditable systems[1][7].


LLM08:2025 — Vector and Embedding Weaknesses

Retrieval-Augmented Generation (RAG) relies on vector databases to provide context, introducing novel risks[1].

  • Data Poisoning: Injecting hidden text (e.g., white text on a white resume) into documents ingested by the RAG system[2].
  • Embedding Inversion Attacks: Recovering original source information from mathematical embeddings[2].
  • Cross-Context Leakage: Multi-tenant architectures risk data bleeding between tenants without proper isolation[1].

Mitigation: Deploy permission-aware vector databases to strictly isolate tenant data and enforce fine-grained access controls. Validate and sanitize all knowledge sources prior to embedding generation[1][2].


LLM09:2025 — Misinformation

LLMs frequently hallucinate — filling gaps in their knowledge with statistically plausible but entirely fabricated information[1]. When combined with user overreliance, this leads to significant security and operational failures.

Package hallucination attacks

Developers might ask an LLM for code libraries, only for the model to hallucinate a non-existent package. Attackers who anticipate this can publish malicious packages with the hallucinated name to compromise developer environments.

Mitigation: Utilize RAG to ground the model in verified external datasets. Implement automatic validation mechanisms for code outputs, design UIs that clearly communicate the model's reliability limits, and mandate independent cross-verification[1].


LLM10:2025 — Unbounded Consumption

Replacing the previous "Denial of Service" classification, this vulnerability encompasses attacks designed to exhaust computational resources, inflict financial damage (Denial of Wallet), or steal proprietary IP[1].

Attack TypeGoalImpact
Context FloodingOverload context windowsService degradation / outage
Model ExtractionClone proprietary model via API queriesIP theft
Denial of WalletTrigger unbounded compute costsFinancial damage

Mitigation: Implement strict API rate limiting, time-out constraints, and input size validation. Limit the exposure of logit_bias and logprobs in API responses to prevent statistical model extraction, and utilize watermarking frameworks to trace unauthorized use[1][7].


Mapping Vulnerabilities to mitigations

#VulnerabilityPrimary MitigationDefense Layer
LLM01Prompt InjectionAI firewalls, input/output inspection, RAG TriadApplication
LLM02Sensitive Info DisclosureDifferential privacy, data sanitizationModel + Data
LLM03Supply ChainML-BOMs, cryptographic signing, red teamingSupply Chain
LLM04Data PoisoningData provenance, sandboxing, anomaly detectionData
LLM05Improper Output HandlingZero-trust output, encoding, CSPApplication
LLM06Excessive AgencyLeast privilege, human-in-the-loopApplication
LLM07System Prompt LeakageNo secrets in prompts, external authApplication
LLM08Vector/Embedding WeaknessesPermission-aware vector DBs, input sanitizationData
LLM09MisinformationRAG grounding, cross-verificationModel + UX
LLM10Unbounded ConsumptionRate limiting, watermarking, input validationInfrastructure

Defense in depth is non-negotiable

No single mitigation addresses all ten risks. A layered defense strategy spanning application, model, data, supply chain, and infrastructure layers is the only viable approach to securing LLM applications].


References

[1]: OWASP — OWASP Top 10 for LLM Applications 2025

[2]: Greshake et al. — Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

[3]: Carlini et al. — Extracting Training Data from Large Language Models

[4]: Gu et al. — BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

[5]: Hubinger et al. — Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training

[6]: Mialon et al. — Augmented Language Models: A Survey

[7]: Perez & Ribeiro — Ignore This Title and HackAPrompt: Exposing Systemic Weaknesses of LLMs

[8]: Zou et al. — PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation

[9]: Tramèr et al. — Stealing Machine Learning Models via Prediction APIs