惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything Composable Output Guardrails: Filter Agent Responses Before They Reach Users Sanitize Your LLM Message Lists Before Every API Call Thread a Run ID Through Every Agent Call So You Can Debug Anything Normalize Provider Error JSON So Your Agent Can Actually Handle Failures Priority Queue for Agent Sub-Tasks: Stop Processing Low-Priority Work First Static Lint Rules for Your LLM Prompts (Before They Hit Production) Step Through Your Agent's Failures Like a Debugger The Simplest Stop Condition: A Hard Cap on Agent Loop Iterations Score Your Agent's Responses With a 0.0-1.0 Rubric (No LLM Judge Required) Fix Bad Structured Output by Feeding the Error Back to the Model Building an effective Storyblok Tool Plugin with SvelteKit How to Get Your Renault / Dacia Radio Code for Free RAG 시스템 실전 구축 (v39) Retraction — scrml’s Living Compiler I built a fitness app where the AI roasts you for eating pizza (and hypes you when you PR) The Top SaaS Founder Communities on Discord (Beyond the AI Hype) I Built a Production-Grade Async Job Queue from Scratch — Here's Everything That Actually Happened How to watch SMS from multiple Android phones in one iOS app We Didn’t Want Another AI Wrapper — So We Explored a High-Speed Hermes Orchestrator for Engineering Crews Multi-tenant além do TenantId: problemas reais e aprendizados em sistemas .NET After failing 23 times, I am sharing How I Actually Prepare for a Tech Interview Every Single Time Now. I built an app that works like a nutritionist for your brain. Here's what happened in 7 days. GoBadge Dynamic: From Module Stats to Universal Badges LangGraph 워크플로우 템플릿 (v39) The git Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Six Levels of MCP Servers One container to replace Grafana + Loki + Tempo + Prometheus The Request/Response Cycle, HTTP, Auth, JWT, OAuth & Sessions — Explained Properly Python Week 3: We Stopped Repeating Ourselves (Loops!) Creating a Custom Grid Editor tool in Unreal Engine 我做了个付费 Telegram bot。Telegram Stars 实际给开发者多少钱,我算了一笔账。 I Got 96% Recall on LLM Hallucination Detection With No ML Model – Just 50 Lines of Python A practitioner's guide to getting more value out of AI coding: agent quality & token optimization How to Handle Telegram Albums in Telegraf I Built a Multilingual Spam Detection Dataset with 149K+ Messages Across 23 Languages How to Handle Telegram Albums in grammY RAG 시스템 실전 구축 (v38) Beyond Pip Install: Why Your AI Agent Needs a "Hermetic" Life-Support System to Survive Resume Building using HTML & CSS SpecFlow: Multi-Agent SDD in Cursor (4 phases, /approve, single code writer) Running ASR for smart homes in the NPU of Intel processors "Building a CI/CD Pipeline From Scratch: A Practical Guide for Developers (with GitHub Actions)" SpecFlow: SDD multi-agente en Cursor (4 fases, /approve, un solo escritor de código) How to Extract Your Full Team Hierarchy from HubSpot (the API doesn't expose it) Adobe Commerce Cloud now costs $40k/year. We migrated from Adobe Commerce to Magento Open Source — here's the honest breakdown .klickd v4.0.0 — Portable AI memory with constraints, strict schemas, and test vectors We Trust Third Party Code, It’s Time to Trust AI Generated Code LangGraph 워크플로우 템플릿 (v38) Sustainable AI Starts with Efficient AI Find Remove duplicated files in Google Drive How to Detect GPU Waste in a Kubernetes Cluster The Privacy Bug in My First Chrome Extension (And How to Avoid It) Serverless Mental Models: What They Don't Tell You Before You Build Preventing GPT hallucination in automated content pipelines: how I structure Make.com flows with data injection Hmm, where were we? AI Visibility Tools, Math Proofs, and Stripped Guardrails Shape Developer Landscape How AI and Electronics Are Changing Healthcare Devices: The Future of Smart Healthcare Author: Shivam Wakade | Founder, PrivSR Making Claude Sound Like Optimus Prime Understanding Reinforcement Learning with Human Feedback Part 5: Training the Reward Model with Loss Functions Learning Progress Pt.20 How Secure LoRa Communication Devices Work: Building the Future of Private and Long-Range Connectivity Author: Shivam Wakade | Founder, PrivSR How I Rebuilt an RPG Map Editor with Rust, React, and WASM Building a System That Automates YouTube Post-Production Building a 100% Serverless Digital Asset Packager in the Browser Game Recommended AI What is Human-In-The-Loop (HITL)? Deep Dive: React Server Components in TanStack Start Migrating off Google Analytics: Umami vs Plausible vs Fathom Building a Portfolio That Actually Demonstrates Software Engineering Async/Await in JavaScript: From Callbacks to Clean Code (2026) Benchmarking LLM Structured Outputs Angular 21 Multiselect Dropdown: A Migration-Friendly Component with Live Functional Tests ShareBox v5 — GPU transcoding, Netflix-style grid, and why I don't need Plex anymore TOML Schema is live Handling Duplicate Shopify Webhook Events (And Why You Must) Original Kubernetes Dashboard — retired upstream, upgraded to Angular 21. لماذا أسست ترينافو للتجار العرب الذين تتجاهلهم المنصات الغربية Construyendo un recomendador de películas en Python: de los datos al modelo When APIs Lie: A Lesson in Defensive Debugging Pope Leo XIV's AI Encyclical: What Builders Must Know (2026) Donna v0.3.0 HTB — MonitorsFour | Writeup The Free Tool You Trust Is the One You Should Fear the Most HTB — MonitorsFour | Writeup Fr 97. Embeddings and Vector Search: Semantic Search That Works Deep Dive: Building "Gravity Paint" - A Tactile Physics Instrument with React, Matter.js, and p5.js ABAP Unit Testing with Test Doubles and Mocking Frameworks: A Senior Architects Guide to Isolating Dependencies in SAP S/4HANA LeetCode Solution: 5. Longest Palindromic Substring kovax-react 0.8: Tailwind v4 preset, FormField adapters, ColorModeScript, and Storybook I built an AI résumé tool that refuses to lie about your experience The hat Azure Entra ID User & Role Management — Step-by-Step Practical Guide With A Simple Excercise The AI-Native Company: How a Single Founder Can Build Global Organizations Powered by AWS and an Ecosystem of Artificial Intelligences Building a Lightweight Remote MCP Knowledge Base on Cloudflare Workers Why I built Trinavo for the MENA merchants Western platforms ignore The N+1 Query That Killed Our Database, And How I Fixed It Docstrings vs Markdown Docs: What Should Developers Actually Write? Training Data Provenance: The Manifest Diff That Explains the Hash Add SVGIcons MCP to Claude Code and Find SVG Icons from Your Terminal
tool-call-budgets: Stop Runaway Agent Loops Before They Hit Your Invoice
Mukunda Rao · 2026-05-26 · via DEV Community

The incident report said the agent called search() 847 times in a single run.

Nobody noticed until the invoice arrived. The agent was tasked with researching a topic. It got into a loop: search, parse the result, decide it needed more context, search again. The search results kept being relevant enough to continue. The termination condition never triggered. The agent was not broken. It was doing exactly what it was told, just too many times, for too long.

The fix people reach for is a global step counter. if steps > 100: break. That works until you have an agent with ten tools and one tool accounts for ninety of those steps. A global cap is too blunt. You want to say: search can run 5 times, fetch_url can run 10 times, send_email can run 1 time. Each tool has its own ceiling.

tool-call-budgets is the library that does that.


The shape of the fix

The core API is three things: a ToolBudgets dict, a RunContext, and a @guarded decorator.

from tool_call_budgets import ToolBudgets, RunContext, guarded, ToolBudgetExceeded

# Define your limits once.
budgets = ToolBudgets({
    "search": 5,
    "fetch_url": 10,
    "send_email": 1,
})

# Create a context for each agent invocation.
ctx = RunContext(budgets)

Enter fullscreen mode Exit fullscreen mode

Wrap each tool function with @guarded:

@guarded(ctx, name="search")
def search(query: str) -> list[str]:
    return my_search_api(query)

@guarded(ctx, name="fetch_url")
def fetch_url(url: str) -> str:
    return requests.get(url).text

@guarded(ctx, name="send_email")
def send_email(to: str, body: str) -> None:
    my_email_client.send(to=to, body=body)

Enter fullscreen mode Exit fullscreen mode

Now pass those wrapped functions to your agent. When search is called a sixth time, ToolBudgetExceeded is raised before the function body runs.

try:
    result = run_agent(
        tools=[search, fetch_url, send_email],
        task="Research the competitive landscape for ...",
    )
except ToolBudgetExceeded as e:
    print(f"Budget exceeded: {e.tool_name} hit its limit of {e.limit}")
    # log it, return partial results, alert, etc.

Enter fullscreen mode Exit fullscreen mode

The exception carries the tool name and the limit. Your agent harness catches it and decides what to do: return partial results, retry with a smaller task, page an operator.


What it does NOT do

  • It does not track token or USD cost. If you need a dollar cap, pair this library with token-budget-py. These two cover different axes: call count vs. spend.
  • It does not reset automatically over time. If you want a "5 search calls per minute" window, that is llm-budget-window. tool-call-budgets is per-run scoped, not time-windowed.
  • It does not inspect or validate tool arguments. If you need to reject a call because the args look wrong, that is agentvet. tool-call-budgets only counts calls, regardless of what is passed.
  • It does not modify or wrap the return value. The decorated function runs normally if within budget. The only change is the guard on the way in.

Inside the lib: the RunContext design

The design choice that took the most iteration was how to scope the counters.

The obvious approach is a shared global dict: call_counts["search"] += 1. That is simple but breaks immediately when you run two agents at the same time. Agent A's search calls pollute Agent B's counter. You end up refusing Agent B's third call because Agent A already burned two of the five slots.

The next approach is a thread-local. One counter per thread. That works for thread-based concurrency but falls apart with async agents running on an event loop. Multiple coroutines share a thread. Thread-local storage looks like one agent to all of them.

RunContext is the answer. You create a new RunContext for each agent run. It is a plain object that holds the counters. You pass it to @guarded when you decorate your functions. The counters live on the context, not on a global or thread-local.

# Each invocation gets its own context.
def handle_request(task: str) -> str:
    ctx = RunContext(budgets)  # fresh counters, zero cost

    @guarded(ctx, name="search")
    def search(query: str) -> list[str]:
        return my_search_api(query)

    return run_agent(tools=[search], task=task)

Enter fullscreen mode Exit fullscreen mode

The RunContext is thread-safe. Its internal counter uses a lock. Two threads wrapping different functions against the same context, rare but possible in some agent frameworks, do not race past the cap.

You do not have to call ctx.reset() between runs. You just make a new RunContext. The old one gets garbage collected. No shared state to clean up.

For agents that spawn sub-agents, you can pass the same RunContext down. The child agent's tool calls count against the parent's budget. Or you create a child RunContext with tighter limits. Both patterns are supported.

# Sub-agent shares the parent's budget.
def run_sub_agent(ctx: RunContext, subtask: str):
    @guarded(ctx, name="search")
    def search(query: str) -> list[str]:
        return my_search_api(query)

    return run_agent(tools=[search], task=subtask)

Enter fullscreen mode Exit fullscreen mode


When this is useful

  • You have an agent with tools that have real-world costs or side effects. send_email, post_to_slack, create_ticket. You never want those called more than once or twice per run.
  • Your agent runs in a loop (ReAct pattern, tool-use loops, multi-step planners) and you want a hard ceiling on how many iterations any given tool can contribute.
  • You are debugging a looping agent in staging and want a fast fail instead of waiting 20 minutes and paying for 500 calls before killing the process.
  • You are deploying to users and want to guarantee that a single bad input cannot cause an agent to spend an unlimited amount on search or API calls.

When this is NOT what you want

  • For simple scripts with one LLM call and no loop. A plain if-statement is enough.
  • For caps based on token count or USD spend. Use token-budget-py or llm-budget-window for those.
  • For preventing duplicate calls with the same arguments. That is caching, not counting. Use tool-call-cache to memoize tool results so budget is not wasted on repeated identical calls.

Install

pip install tool-call-budgets

Enter fullscreen mode Exit fullscreen mode

No dependencies. Zero. The library is pure Python with no third-party imports.

GitHub: MukundaKatta/tool-call-budgets

44 tests, all passing.


Sibling libraries

Lib Boundary Repo
token-budget-py Token and USD cap per run MukundaKatta/token-budget-py
llm-budget-window Time-windowed cap (per minute, hour, day) MukundaKatta/llm-budget-window
tool-call-cache Memoize tool results so budget is not wasted on repeated calls MukundaKatta/tool-call-cache
llm-circuit-breaker-py Error-rate circuit breaker for LLM calls MukundaKatta/llm-circuit-breaker-py
agent-deadline Cooperative time cap per agent run MukundaKatta/agent-deadline

token-budget-py and tool-call-budgets are the most common pair. Token budget says "stop spending money." Call-count budget says "stop calling this tool." Combined, you have both a USD ceiling and a per-tool ceiling. Neither one alone covers the full picture.


What is next

A few things are on the list:

  • A ctx.summary() method that returns a dict of {tool_name: {calls: N, limit: M, remaining: K}} so agent harnesses can log or display budget status mid-run.
  • A soft-limit mode that logs a warning at 80% of the cap without raising, giving the agent a chance to wrap up gracefully before hitting the hard stop.
  • A ToolBudgets.from_config(path) loader for teams that want to define limits in a YAML or JSON config file rather than in code.

The core loop is stable. @guarded, RunContext, ToolBudgetExceeded. Those three pieces cover the most common failure mode: a tool called too many times in a single run, unnoticed, until the invoice arrives.


Built for the Hermes Agent Challenge. Part of a series of small libraries for production agent infrastructure.