惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
Threat Research - Cisco Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
V
Vulnerabilities – Threatpost
GbyAI
GbyAI
P
Proofpoint News Feed
L
LINUX DO - 热门话题
P
Palo Alto Networks Blog
A
About on SuperTechFans
T
Tenable Blog
M
MIT News - Artificial intelligence
IT之家
IT之家
I
Intezer
D
DataBreaches.Net
爱范儿
爱范儿
T
Threatpost
C
CERT Recently Published Vulnerability Notes
云风的 BLOG
云风的 BLOG
博客园 - 三生石上(FineUI控件)
WordPress大学
WordPress大学
K
Kaspersky official blog
大猫的无限游戏
大猫的无限游戏
A
Arctic Wolf
Y
Y Combinator Blog
Cyberwarzone
Cyberwarzone
酷 壳 – CoolShell
酷 壳 – CoolShell
D
Darknet – Hacking Tools, Hacker News & Cyber Security
H
Help Net Security
Microsoft Security Blog
Microsoft Security Blog
Spread Privacy
Spread Privacy
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
AWS News Blog
AWS News Blog
博客园 - 聂微东
C
Check Point Blog
S
Securelist
有赞技术团队
有赞技术团队
雷峰网
雷峰网
aimingoo的专栏
aimingoo的专栏
Last Week in AI
Last Week in AI
Stack Overflow Blog
Stack Overflow Blog
MongoDB | Blog
MongoDB | Blog
D
Docker
G
GRAHAM CLULEY
T
The Exploit Database - CXSecurity.com
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tailwind CSS Blog
L
Lohrmann on Cybersecurity
G
Google Developers Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
L
LangChain Blog

A10 Networks

What Is Low-latency Trading? | A10 Networks Multi-Vector DDoS: 11 Amplification Vectors | A10 Healthcare Cloud Compliance: HIPAA & GDPR Guide | A10 LLM Hallucination & Misinformation | OWASP LLM09:2025 Healthcare Network Protection for Hospitals & Clinics RAG Security: Vector & Embedding Weaknesses | OWASP LLM08 System Prompt Leakage | OWASP LLM07:2025 Explained LLM Excessive Agency | OWASP LLM06:2025 Explained LLM Supply Chain Security | OWASP LLM03:2025 Trust, Control and Security in the Age of Agentic AI Summit | A10 Networks LLM Improper Output Handling | OWASP LLM05:2025 Data Poisoning Attacks in LLMs | OWASP LLM04:2025 Sensitive Information Disclosure | OWASP LLM02:2025 Game Over for DDoS Attacks in Gaming | How to Achieve Resilience Prompt Injection | OWASP LLM01:2025 Explained Beyond PCI Summit: Battling Bots, Fraud, and AI-powered Threats Web Application Security Best Practices for 2026 | A10 Networks A10’s 5 Key Takeaways on Application & API Security Trends Securing Financial Applications in the AI Era Summit Unified Application Delivery, Security, and AI Protection for Financial Services The Most Famous DDoS Attacks in History Post-quantum Cryptography Comes to A10 SSL/TLS Data Plane Real-time DDoS Carpet-bombing: NTP Amplification Evasion Shadow AI | Glossary AI & LLM Security: Hype vs. Reality and What to Prioritize App Delivery in the Age of AI Summit | Hybrid & Cloud-Native Strategies A Day in the Life of a Stressed Web Application | ADC & WAF Resilience Avans University of Applied Sciences Modernizes Hybrid Application Delivery with A10 Networks Preparing Government Infrastructure for AI Adoption | Expert Summit Report: IDC Spotlight Report: Modernizing Application Delivery Infrastructure for AI-powered Applications Broken Object Level Authorization (BOLA): The #1 API Security Risk | Free Webinar | A10 Networks Product Demo: A10 AI Firewall by A10 Networks AI Firewall for Enterprise AI Security | A10 Networks API Traffic Management for AI and Agentic Systems | Expert Summit AI is Here: How Ready Is Your Infrastructure? | A10 Networks Pulse Campaign Analysis: Brazil ISPs Expose Next-Gen DDoS Automation Trends Tech Companies Lead GenAI Adoption but Face Infrastructure Gaps Cyber Defense Magazine's 2026 Global InfoSec award – Editor's Choice – API Security | A10 Networks Load Balancing Solutions for Availability & Security | A10 Networks Top 9 Generative AI Security Risks in 2026 LLM Security: Protecting AI Models & Applications
LLM Unbounded Consumption & DoS Attacks | OWASP LLM10
Richard Tuma · 2026-05-28 · via A10 Networks

Unbounded consumption occurs when an LLM application allows excessive or uncontrolled inference operations, leading to resource exhaustion, financial loss, service degradation, or model theft. Inference, the process of generating responses to prompts, is computationally expensive. When applications fail to restrict or manage inference usage, attackers can exploit this to cause denial of service (DoS), trigger denial of wallet (DoW), degrade service performance, extract or replicate models, or exploit side channels. Because LLMs often operate in cloud-based, pay-per-use environments, uncontrolled consumption can have immediate operational and financial consequences.

Key Takeaways

  • Unbounded consumption occurs when LLM applications allow uncontrolled inference, enabling attackers to cause denial of service, drain financial resources, degrade performance, or steal model intellectual property
  • Denial of Wallet (DoW) is a financially motivated attack unique to cloud-based AI services, where attackers generate excessive API operations specifically to exploit pay-per-use billing and impose unsustainable costs on the provider
  • Model extraction via API is a growing threat: attackers use crafted queries and prompt injection to collect sufficient outputs to replicate a functional shadow model, circumventing traditional IP protections without ever accessing model weights directly
  • Side-channel attacks can exploit input filtering mechanisms to harvest model weights and architectural information, compromising the model's security and enabling further downstream exploitation
  • Mitigation requires a combination of rate limiting, input size validation, resource allocation monitoring, logit/logprob obfuscation, output watermarking, and graceful degradation under heavy load

Why This Is Dangerous

LLMs require significant CPU/GPU compute, memory, network bandwidth and API usage quotas. If these resources are not tightly controlled, attackers can overwhelm infrastructure, drive unsustainable cloud costs, steal intellectual property and force service outages. Unbounded consumption is both a security and economic risk.

Common Vulnerability Patterns

Variable-length Input Flood: Attackers send numerous inputs of varying lengths to exploit processing inefficiencies, exhausting memory and compute.

Denial of Wallet (DoW): In pay-per-token or pay-per-inference environments, attackers generate high volumes of requests, creating unsustainable financial costs.

Continuous Input Overflow: Inputs repeatedly exceed the model’s context window, forcing expensive processing and causing degradation.

Resource-intensive Queries: Attackers craft prompts designed to trigger the most computationally expensive operations, such as complex reasoning chains, long generation sequences, or intricate structured outputs.

Model Extraction via API: Attackers systematically query the model API to collect outputs and reconstruct a partial or shadow model. This threatens intellectual property, competitive advantage, and model integrity.

Functional Model Replication: Attackers use the model to generate synthetic training data, then fine-tune another model to replicate its behavior, bypassing traditional extraction detection.

Side-channel Attacks: Attackers exploit input filtering mechanisms or architectural quirks to infer model weights, architecture details and internal behavior. This can facilitate deeper exploitation.

Example Attack Scenarios

Scenario 1 – Oversized Input

An attacker submits extremely large inputs, exhausting memory and CPU resources, potentially crashing the system.

Scenario 2 – High-volume Requests

A flood of API calls renders the service unavailable to legitimate users.

Scenario 3 – Expensive Query Exploitation

Specially crafted prompts trigger computationally heavy inference paths, causing performance collapse.

Scenario 4 – Denial of Wallet

An attacker exploits pay-per-use billing to create unsustainable costs.

Scenario 5 – Functional Model Replication

An attacker generates large amounts of synthetic data from the API and fine-tunes a competing model.

Scenario 6 – Filtering Bypass and Side-channel Attack

An attacker bypasses filtering to extract model details via side-channel methods.

Prevention and Mitigation Strategies

Input Validation: Enforce strict size limits, validate input length and structure, and reject excessive payloads.

Limit Exposure of Logits and Logprobs: Restrict or obfuscate detailed probability outputs and avoid exposing sensitive inference metadata.

Rate Limiting: Enforce request quotas, limit per-user or per-IP usage and apply API throttling.

Resource Allocation Management: Monitor CPU/GPU usage, dynamically cap per-session resource allocation and prevent single-user resource monopolization.

Timeouts and Throttling: Set processing time limits and throttle long-running requests.

Sandbox Techniques: Restrict model access to internal services, limit network reachability and control data access scope. This also mitigates insider risks and side-channel exposure.

Logging, Monitoring and Anomaly Detection: Track unusual request patterns, detect abnormal inference volumes and respond to suspicious consumption spikes.

Watermarking: Embed detectable signals in outputs to identify unauthorized replication or misuse.

Graceful Degradation: Under heavy load, maintain partial service rather than full failure.

Limit Queued Actions and Scale Robustly: Restrict queue depth, implement dynamic scaling and use load balancing.

Adversarial Robustness Training: Train models to recognize and mitigate extraction attempts.

Glitch Token Filtering: Maintain lists of known glitch tokens and scan outputs before adding them to context windows.

Implement Access Controls: Implement RBAC, enforce least privilege, and restrict access to training environments and repositories.

Centralized Model Inventory: Maintain governed registries for production models.

Use Automated MLOps Deployment: Use governed pipelines with approval workflows and tracking to prevent unauthorized deployments.

The Core Security Principle

LLMs are high-cost computational systems. If access is not controlled, attackers can exhaust resources, drain finances, extract intellectual property, or collapse availability. Unbounded inference equals unbounded risk.

The Key Takeaway

Unbounded consumption is a denial-of-service risk, financial exploitation risk and a model theft risk. Mitigated it will require strict usage limits, resource governance, monitoring and anomaly detection, controlled API exposure, and secure MLOps practices. Control the inputs, control the usage and control the cost.

< Back to Glossary of Terms