惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

博客园 - 司徒正美
aimingoo的专栏
aimingoo的专栏
MongoDB | Blog
MongoDB | Blog
云风的 BLOG
云风的 BLOG
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
酷 壳 – CoolShell
酷 壳 – CoolShell
博客园 - 聂微东
Y
Y Combinator Blog
T
Tailwind CSS Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
S
SegmentFault 最新的问题
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
博客园 - 【当耐特】
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
J
Java Code Geeks
美团技术团队
Google DeepMind News
Google DeepMind News
博客园_首页
Apple Machine Learning Research
Apple Machine Learning Research
T
The Blog of Author Tim Ferriss

Forbes - Innovation

​How Operational Access Can Ensure Readiness For The Next Storm Why Russians Are In Despair Over Truck-Busting ’Martian’ Drones New ‘Crimson Desert’ Patch Adds Another Long-Time Player Request Why Do Our Fingers Get Wrinkly In Water? An Evolutionary Biologist Explains How To Think About High-Stakes Dispute Resolution You Can Build A CRM In A Day. You Still Can't Run A Company In One. 6 Teachable Moments From An Atlanta Rush Hour Downpour Why Your AI-Generated Marketing Content Sounds Generic ​The Accountability Crisis In The Creator Economy Scaling Across Borders: What It Takes To Succeed Globally Apple Rolls Out Two Crucial Health Features For Apple Watch And AirPods In India Competitive Advantage In Logistics Isn't AI ​Why AI Can Write Code, But It Can't Teach Engineers Critical Thinking The Importance Of Red Teaming For Scaling Enterprise AI Agents Why The Next AI Moat Won’t Be Productivity, But Emotional Value Banking’s AI Problem Isn’t The Model. It’s The Plumbing The Case For Structural Reform Through Tokenization SpaceX Scrubs Starship Launch As $2 Trillion IPO Nears LEGO F1 Ferrari Helmet Review (43014): Rough Build, Spectacular Finish Oleksandr Usyk Vs. Rico Verhoeven: Date, Time And How To Watch If Majoring In Computer Science Is Doomed Due To AI, The Latest Claim Is That Majoring In Philosophy Is The Next Best Choice MVP's Nakisa Bidarian On Rousey-Carano Viewership, Shields' Ban And PFL Co-Promotion See A ‘Planet Parade’ As Three Worlds Shine After Sunset This Weekend Soundcore’s Liberty 5 Are First Earbuds To Use Anker’s Thus AI Chip Code Ninjas: The AI-In-Education Problem Isn’t Cheating. It’s Passivity. Today’s Wordle #1798 Hints And Answer For Friday, May 22 NYT ‘Pips’ Hints, Answers And Walkthrough For Friday, May 22 Apple Teases iOS 27 AI Upgrades With Major Accessibility Overhaul To iPhone Samsung Releases Free One UI 8.5 Upgrade To Millions Of Galaxy Phones How Instagram Became A Venture Capital Deal Engine ‘Star Wars: The Mandalorian And Grogu’: Which Movie Is Best? New Study: A Quarter Of College Students Using AI Daily Cheat With It NYT Connections Hints Today: Friday, May 22 Clues And Answers (#1,076) NYT Connections Answers Explained Friday May 22 NYT Strands Hint Today: Friday, May 22 Clues And Answers (Put Down Your Ruler) Quordle Hints Today: Friday, May 22 Clues And Answers Webb Telescope Detects Cloudy Mornings And Clear Nights On Alien World AI Flattening Organizations Is The Latest Chapter In A Continuing Story AI Was Supposed To Reduce Your Workload. Here’s Why It Hasn’t, And Here’s How It Can. DevOps Practices Tech Teams Must Strengthen In The AI Era The End Of ‘Destiny 2’: All Expansions Canceled, Maintenance Mode Incoming ‘The Mandalorian And Grogu’ Recap Before You See The Movie, Post-Credits Scene And More Fidelity Collective Buys Up Westone Audio And Etymotic Brands Why AI Profitability Belongs To Enterprise, Not Consumer Scale OpenAI And Anthropic Are Testing Two Very Different AI Business Models Kordata Launches To Advance Neurotech-Powered Clinical Trials Solving The Identity Crisis: Putting Today’s Fragmented Consumer Back Together These Are The Most- And Least-Expensive New Cars To Run At Today’s Fuel Prices New Reports And New Paradigms Show Drive In AI Smart Glasses Market Samsung Galaxy Z Fold 8: Price Rise, Bad Crease News Anthropic And Microsoft Team Up Why Nvidia Needs More Than GPUs To Win The AI Infrastructure Race Nvidia Is Expanding Infra Partnerships. Will A Big Deal Happen? Drug Overdose Deaths Fell in 2024. Why Experts Remain Cautious Microsoft Is Scrapping SMS 2FA Codes—What You Need To Do ‘Wax Heads’ Review: Somehow The Vital Connection Is Made Securing The Internet’s Humanity Netflix’s Best New Show Lands A Perfect Rotten Tomatoes Score As A Final Duffer Bros. Effort AI Might Not Bring On A Job Crisis, But A Workforce ‘Mismatch’ Could Why Post-Quantum Compliance For Banks Starts In Containers Do Your AI Agents Have Governance? Most Don’t, And They’re Live Why Complexity Is The Insider Threat Hiding In Plain Sight ‘Supergirl’ Is Starting To Feel Like It May Be A Big DCU Miss Google Confirms 2 Critical New Flaws—How To Jump The Update Queue Google Splits Its Agent Strategy For Two Developer Audiences Rethinking GRC In The Tokenized Economy ‘The Boys’ Series Finale Review Scores Are Way Under ‘Stranger Things’ Autonomous Data Stewardship: How AI Agents Are Redefining Master Data Management In Financial Services A Small Business Guide To Understanding Multistate Tax Obligations Why Performance Has Become The New Currency In Advertising The Plan For FEMA Reform, Less People In D.C.,More Responsibility For States There’s A Way ‘Gen V’ May Now Live On After ‘The Boys’ Finale Garmin Cirqa Price May Be Far Higher Than Expected Securing AI Cloud Systems: Intelligent Testing For Intelligent Systems 2 New Microsoft Defender Zero-Days Exploited—Patch Now Rolling Out 2 Tell-Tale Signs Of ‘Fake Love’ In A Relationship, By A Psychologist California Lets Cops Give Tickets To Robocars, Which Is Ridiculous Why Do Humans Have Unique Voices? An Evolutionary Biologist Explains The Anatomy That Makes You Unmistakable Of All The Professions AI Is Disrupting, Accounting Has The Worst Math How Connected Reporting And Dynamic Waterfalls Reshape Fund Services Humanoid’s New Deal: Bosch Will Build Its Robots With Schaeffler Parts The New Resilience Part 2: Evolving Best Practices In AI And IIoT ​How AI Is Changing The Economics Of Integration ​Why The Cheapest AI Stack Becomes The Most Expensive At Scale The New Surgeon General Advisory On The Harms Of Screen Use— Here’s What The Science Says About Risks And Benefits Developing An Executive Cybersecurity Strategy For Directors Stop Measuring AI Spend, Start Measuring Impact AI Agents Belong In Your Identity Program How SMEs Unlock Greater Value From AI Why Small, Elite Teams Outperform Big Ones If You Value Online Security Stop Using Public Wi-Fi Hotspots Demystifying Success: A Practical Approach To Guiding Your Business Are Financial Institutions Failing To Back The Low-Carbon Economy? Airbnb CEO Brian Chesky Called Chinese AI Fast And Cheap. Now, Congress Wants Answers The Neurotech CRO: Kordata Launches To Power Next-Gen Clinical Trials Latest AI Behaves More Like Humans By Rudely Interrupting You During Conversational Chats And We Might Relish It Google I/O 2026 Turned Gemini Into An Agent Platform Advanced Packaging Leads The Way To Intel Foundry Success From AI Policies To AI Literacy In Education Today’s Wordle #1797 Hints And Answer For Thursday, May 21
The Architecture Behind Cost-Effective AI Agents
Aruna Veerap · 2026-05-22 · via Forbes - Innovation

Aruna Veerappan is Senior Director of Engineering at Upwork, leading Developer Enablement to reduce friction and boost team productivity.

getty

Engineering leaders are discovering that the hardest part of AI agents isn't the AI—it's the architecture underneath.

I learned this firsthand when a quarterly budget disappeared in weeks. Nothing was broken, the models worked and the engineers were strong. But the system hadn't been designed for cost, and the bill arrived before a single workflow reached production.

The root cause: we were pointing expensive models at every task. Verifying file existence. Checking ownership against APIs. Routing logic that could have been a single if-statement. Each call seemed reasonable. The cumulative cost was not.

I've come to call this the Agent Cost Spiral—and engineering teams across the industry are running into it right now.

"An Agent Cost Spiral isn't an AI problem. It's an architecture problem. And once you see it, you can't unsee it."

This pattern has a precedent. A decade ago, teams migrated to the cloud chasing savings, then watched their bills explode past on-premise costs. The architecture was the problem—not the technology. AI inference costs follow the same arc. The fix: stop treating it like a utility and start treating it like an engineering problem.​

Tiered Architecture Every Agentic System Needs

A well-built AI agent isn't a single model receiving a single prompt. It's a choreographed system where each task is matched to the minimum level of intelligence required to complete it well.

Tier 1: The Deterministic Skeleton—Just Use Code

If your process follows a fixed rule—"if a customer's order exceeds $5,000, route to a Senior Rep"—you don't need AI. You need a conditional statement. Enterprise teams routinely spend real money asking frontier models to handle basic routing logic, and the cost problem is the smaller concern. AI is probabilistic, which means even a capable model can get a simple rule wrong some percentage of the time. For business logic that must be consistent 100% of the time, probabilistic is another word for broken. Build your guardrails in code. Let AI operate within them.

Tier 2: The Workhorse Models—Cheap, Fast and Good Enough

Summarizing documents. Extracting fields from structured data. Reformatting outputs. These are real, valuable tasks—but they don't require a frontier model. Smaller "flash" models handle these workloads at roughly 1% of the cost of a premium model. If you're using a frontier model for this work, you're not just overpaying—you're slowing down your pipeline.

Tier 3: The Frontier Model—Reserve It For What It's Good At

Top-tier models are extraordinary at synthesis: taking conflicting information from multiple sources and producing nuanced, well-reasoned output. That's where the cost is justified. The mistake is giving them everything else too. When you feed a frontier model thousands of lines of raw, unfiltered context, two bad things happen—costs spike and quality drops. The right move is to let Tier 2 do the reading and summarizing, then hand a clean, pre-processed brief to your Tier 3 model. You're paying for reasoning, not retrieval.​

What This Looks Like In Practice

One of the most common enterprise headaches is keeping technical documentation current—most teams either let it stale or throw expensive engineering hours at it.

The Lazy Approach

Send the entire codebase to a premium model and ask for documentation. Cost: ~$15 per service. The model is overwhelmed by irrelevant code, hallucinates configuration details, misses security settings and gets version numbers wrong.

The Architected Approach

This approach can be divided into three tiers:​

Tier 1.

Code: automatically identify and extract the relevant configuration files—no AI needed, just pattern matching.

Tier 2.

Workhorse Model: summarize those files into a structured brief. Fast, cheap and accurate.

Tier 3.

Frontier Model: take the brief and write the final, polished documentation.

"Cost: $0.50 per service. Accuracy: measurably higher. That's a 30× cost reduction with better output—not a trade-off."

The quality improvement isn’t incidental—it’s structural. The frontier model performs better because it’s receiving cleaner input. You’ve set it up to succeed.​

The Staircase Scaling Rule

There's a second failure mode that hits teams who've already built something good. The agent tests well, confidence is high and someone makes the call to run it on everything at once.

High-cost failures almost always trace back to under-validated systems running at scale. The fix is Staircase Scaling—earning the right to scale by proving the system at each step before moving to the next.

Step 1.

The Quintet (n=5): run five samples and manually review every output. If the agent fails here, your debugging cost is $2, not $2,000.

Step 2.

The Squad (n=15): run a more diverse batch of fifteen. This is where edge cases surface.

Step 3.

Full Rollout: only when your Squad pass rate is consistently above 90% should you scale to the full dataset.

This sounds slow. It isn't. Teams that skip this process lose weeks to remediation. Teams that follow it reach production confidently within days.​

The Only Metric That Matters​

Here's what separates teams genuinely automating from teams just shifting work around: Cost per Successful Output (CSO)—not cost per API call or tokens consumed, but cost per output that clears your quality bar without human correction.

If a senior engineer spends three hours cleaning up AI-generated documentation that cost $500 to produce, nothing was automated. The work simply moved—with frustration on top. The real test is whether your CSO is lower than the cost of a human doing the same task well. Everything else is theater.

Engineering leaders getting this right share a shift: they stopped asking "Which model is smartest?" and started asking "What does each task need?" Costs start making sense. Failure modes become predictable. The architecture becomes something you can defend to a CFO.

You don't need the smartest model. You need the right model for each job—and the discipline to know the difference.

The Agent Cost Spiral is real. It isn't a reason to pull back—it's a reason to build deliberately. Get the architecture right first. The ROI will follow.​


Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?