惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google Online Security Blog
Google Online Security Blog
博客园_首页
酷 壳 – CoolShell
酷 壳 – CoolShell
Jina AI
Jina AI
博客园 - Franky
大猫的无限游戏
大猫的无限游戏
Hugging Face - Blog
Hugging Face - Blog
博客园 - 司徒正美
V
V2EX
雷峰网
雷峰网
云风的 BLOG
云风的 BLOG
V
Visual Studio Blog
F
Full Disclosure
Y
Y Combinator Blog
V
V2EX - 技术
Attack and Defense Labs
Attack and Defense Labs
S
Security @ Cisco Blogs
Schneier on Security
Schneier on Security
Microsoft Azure Blog
Microsoft Azure Blog
SecWiki News
SecWiki News
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
The GitHub Blog
The GitHub Blog
量子位
PCI Perspectives
PCI Perspectives
S
Secure Thoughts
D
Darknet – Hacking Tools, Hacker News & Cyber Security
AWS News Blog
AWS News Blog
Blog — PlanetScale
Blog — PlanetScale
爱范儿
爱范儿
K
Kaspersky official blog
B
Blog
A
Arctic Wolf
Hacker News: Ask HN
Hacker News: Ask HN
L
LangChain Blog
T
Tor Project blog
P
Privacy & Cybersecurity Law Blog
Recent Announcements
Recent Announcements
宝玉的分享
宝玉的分享
The Register - Security
The Register - Security
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
Lohrmann on Cybersecurity
D
Docker
A
About on SuperTechFans
H
Hackread – Cybersecurity News, Data Breaches, AI and More
Google DeepMind News
Google DeepMind News
The Last Watchdog
The Last Watchdog
S
Security Affairs
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
P
Privacy International News Feed
Simon Willison's Weblog
Simon Willison's Weblog

The New Stack | DevOps, Open Source, and Cloud Native News

Agentic development hinges on verification. For cloud-native software, that is a runtime problem. AI agents need infrastructure: Why Europe’s regional cloud strategy matters Transform your AI coding agent into a deterministic Java Spring expert WeAreDevelopers is coming to the US to give unsung developers a bigger voice Cleaner AI training data, fewer bugs: Sonar’s SonarSweep explained Observability overload is drowning engineers Google’s DiffusionGemma is 4x faster than its other Gemma models Fable 5: Guardrails and burn rate are annoying users, who say it’s still better than Opus 4.8 The Anthropic leader who built Claude Code says he ditched prompting — now he just writes loops. AWS can now mathematically prove your VMs are isolated Microsoft pulled 73 GitHub repos after malware attack — but still won’t say who’s compromised Databricks wants to kill the “email me a file” problem for AI agent skills Ramp bets forward deployed engineers can do what off-the-shelf finance AI can’t Git real: AI agents aren’t just for solo developers anymore Anthropic launches Claude Mythos/Fable 5, but you better try it soon This AI agent startup ditched Anthropic for DeepSeek — and says it’s saving millions When your data model is the bottleneck: lessons from Medium’s feature store How long before we stop reading the code? The tokenmaxxing party is over, and Revenium is mopping up How AI is solving the memory crunch it created Microsoft’s pitch to enterprises: Ditch Azure Repos for GitHub, despite its rocky reliability record Claude Code’s biggest upgrade yet ran 5 agents at once — here’s what happened Why Anthropic just doubled Claude Cowork limits at no charge For years, Apache Cassandra handed this work to your team — 6.0 takes it back “A dangerous combination”: The 2 factors that can “corrupt” AI agent workflows With Foundry, Microsoft bets the enterprise AI battle is about reliability, not capability Microsoft unlocks Visual Studio for developers left behind by its own AI AI teams now deploy 1,000 times a month. Your pipeline wasn’t built for that. Microsoft just made the agent runtime free — and kept everything around it “Whoever builds the most joyous product wins”: The agent war begins Netlify CTO Dana Lawson: Writing code is no longer the job From Jupyter Notebook to production: How to ship AI systems that actually work OpenClaw used Gavriel Cohen’s code and exposed the AI Agent accountability problem Replit shows how vibe coding is getting its own financial stack — and a path to profit Cloudflare aqui-hires VoidZero: Did a piece of the open web just stabilize, or become more brittle? Cursor cuts prices and adds enterprise spend controls amid “tokenomics” reckoning Google Gemma 4 12B nearly matches 26B benchmarks — and runs on your laptop Snowflake thinks it knows what’s really slowing developers down Autonomous agents have met their biggest challenge yet: The database. Why agentic AI makes the ops platform the most important layer in the enterprise How to dramatically improve enterprise security alert tuning to battle cyberattacks Why the need for humans won’t disappear in the age of autonomous databases How to secure Kubernetes in the age of AI workloads Asana says its new AI “chief of staff” turns your Slack chaos into trackable work Nvidia’s best model is now live Mate Security’s Asaf Wiener made every backend engineer a model router. He’s right to. The AI cost crisis finally has a watchdog — just not the companies causing it How to get operational data off the factory floor without creating an IT breach Why CPUs still matter in the age of AI agents Rayfin: Microsoft’s answer to the gap between vibe coding and enterprise production Microsoft bets the enterprise AI race will be won on data context, not model power “A successful attack could be catastrophic”: Anthropic gives more groups access to Claude Mythos How GitHub plans to win developers back Microsoft really, really, really wants developers to love Windows again With Intelligent Terminal, Microsoft is reinventing the Windows terminal Microsoft debuts “Scout” at Build, a new personal agent for work OpenAI’s Codex adds new tools — Sites, Annotations, more plugins — for knowledge workers GitHub Copilot’s usage-based billing is live: Here’s what you need to know OpenAI, Anthropic, Google, Amazon, and xAI all fail on type of attack, study finds JetBrains open-sources Mellum2 to go where Claude Code can’t Claude Code vs. Cursor vs. Codex vs. Antigravity — six months in This coding agent doesn’t want your feedback — it ships without it “Blowing things up”: The one move vendors got wrong on AI agents At Sapphire, SAP makes the case that enterprise AI is a context problem Gavriel Cohen found his own code inside OpenClaw, so he walked away AI retrieval at scale is becoming a systems problem, not a tooling problem The DIY platform trap that’s burning out engineering teams I tested Cursor’s new Jira integration and it’s 5 stars, no notes. Here’s why. Why GPT-5.4, Claude, and Gemini can’t agree on basic, real-world facts Replit’s vibe coding platform just got a Visa-backed identity layer for AI agents — and it changes how agents spend money Opus 4.8 Made Claude Smarter. Token Discipline Got Urgent. Why Linux creator Linus Torvalds gets angry hearing “99% of code is AI” Vendor neutrality isn’t magic: A hard look at the OpenTelemetry ecosystem “The AI did it” won’t save you when EU regulators come knocking The fix for soaring AI cloud bills exists — so why won’t we trust it? AI is shipping code faster than security was built to handle Why AWS scrapped OpenSearch’s architecture to chase agent workloads Claude Opus 4.8 is here: effort controls, dynamic workflows, cheaper fast mode, better honesty, less deception Percona celebrates 20th birthday with new foundation — and a goat cake Why OpenAI and Anthropic are hiring forward deployed engineer teams Claw-style AI agents are coming to the enterprise. The governance infrastructure is still catching up. The agentic identity crisis: Why your security isn’t ready for the AI revolution Debugging the undebuggable: building observability into probabilistic AI systems Snowflake commits $6B to AWS as it pushes deeper into AI Why MotherDuck refuses to fork DuckDB Researcher “gave Claude Code ‘ADHD’… and it thinks 2x better now.” Outside experts want more proof. “There is no accountability”: AI coding agents are installing packages no one owns “Tokenmaxxing is real, expensive & it’s spreading”: AI budgets are exploding With Google’s debut, the most important AI agent feature is now the most boring one Why AI agents need a Context Lake Google ranks the best AI for building Android apps, and the winner isn’t Gemini Google pushes Pro, Ultra, and free users from open-source Gemini CLI to closed-source Antigravity CLI The reason enterprise outages almost never start where ops teams think Taming the agentic influx: a blueprint for AI business observability How the AC/DC framework helps teams govern AI coding agents GitLab 19.0 trades its string section for a full DevSecOps orchestra Who’s monitoring the agents? How Jaeger hit 8.6× compression on 10 million spans with ClickHouse What ClickHouse learned from a year of coding with AI agents OpenClaw passed 300,000 GitHub stars. Then Google launched Spark.
Loops are replacing prompts. Verification is about to be your biggest problem.
Arjun Iyer · 2026-06-14 · via The New Stack | DevOps, Open Source, and Cloud Native News

Something shifted in the AI coding discourse this month. The argument is no longer about whether agents can write production code or which model is best. It is about who, or what, should be prompting them.

The phrase that captured it, “design loops that prompt your agents,” set off a week of discussion that Matt Van Horn synthesized well. Strip away the noise, and the signal is clear: A third era of agentic development is taking shape, and it raises significant questions about what engineers do, what software delivery costs, and what infrastructure must exist beneath it.

For teams building cloud-native applications on Kubernetes, the answers to those three questions are about to matter a great deal.

Three eras, one direction

Agentic development has moved the human up one level of abstraction at a time.

The first era was prompt-driven. A developer sat inside the loop, typing instructions, reading output, typing corrections. Throughput was capped at one developer’s attention.

The second era is spec-driven, and it is where most adopting teams sit today. The developer invests up front: detailed specifications, context documents, conventions encoded in the repo. The agent executes against the spec, and the human reviews completed work. The unit of work grew from a prompt to a task.

The third era makes the loop itself the unit of work. A loop is a small program that prompts the agent, evaluates the response, decides whether the goal is met, and, if not, prompts again with what it has learned. It runs on a schedule rather than on human attention. Loops dispatch other loops.

The developer no longer writes the code, and increasingly no longer writes the task. They write the system that generates, evaluates, and retries the work.

“The developer no longer writes the code, and increasingly no longer writes the task. They write the system that generates, evaluates, and retries the work.”

Skeptics have called this a “cron job with better marketing”, and the comparison is useful for where it breaks. A cron job executes a fixed script. A loop has a decision-maker in the body: a model that reads the state of the work and chooses the next action. The engineering challenge is everything you wrap around that decision so the loop converges on correct instead of wandering.

What the developer becomes

In a prompt-driven world, the developer’s leverage came from skill at steering. In a spec-driven world, it came from clarity of intent.

In a loop-driven world, leverage comes from the quality of the system around the agent. The engineering question that matters becomes: What does this loop check before it declares success? What feedback does it receive when it fails? When does it stop?

Those questions split the role in two. Application developers become authors of intent and of loops: they decide what to build and encode the goal. Platform engineers become the owners of what done means.

The checks a loop runs, the environments it runs them in, the budget it operates under, and the evidence it attaches to its output all have to be consistent across every loop in the organization, not improvised by each one.

This is a familiar pattern with a new subject. A decade ago, platform teams turned CI/CD from a thing every team hacked together into a paved road. The verification layer for agent loops is the same transition, except it sits in the inner loop, before any PR exists, at whatever parallelism the loops generate.

The economics: loops converge, or they burn

It is tempting to assume agents made code generation cheap, so none of this matters. Not quite.

Agents made generation fast, and cheaper than developer hours, but tokens are a real and growing line item. An agent running for hours is a spend decision, and a loop is an agent running indefinitely by design. Budgets that survived interactive sessions do not automatically survive loops.

The first-order response is guardrails: iteration caps, no-progress detection, and spend ceilings. They are necessary, but they only bound the damage of an inefficient loop. They do not make it efficient.

Loop efficiency comes down to two dimensions, and they multiply.

The first is feedback quality, which determines how many iterations the loop needs. A loop that gets a vague failure signal guesses at the cause and tries again. A loop that gets the real error, from the real system, with enough context to localize the cause, fixes the actual problem and moves on. Feedback quality also bounds how correct the loop can ever be: a loop can only converge on what its feedback can see.

The second is where the loop closes, which determines the cost of each iteration. If the loop closes in CI or after the PR, every cycle pays for a pipeline run plus queueing, and the cadence is minutes to hours. If it closes in the inner loop, against a runtime the agent can reach directly, the cadence is seconds.

“Total loop cost is the product: iterations to verified, times cost per iteration.”

Total loop cost is the product: iterations to verified, times cost per iteration. The dimensions feed each other too. Push truthful feedback to CI, and each cycle slows while the agent iterates on partial signals in between. Pull it into the inner loop, and you compress both terms at once.

Cloud-native raises the bar for done

For a self-contained program, both dimensions come nearly free. The test suite runs locally in seconds and tells the whole truth, because the whole truth is local.

In a distributed system, the truth is not local. A change is correct or broken based on how it behaves alongside the services it calls, the data stores and queues it touches, and the routing and policy layers it runs under.

The feedback the agent can reach quickly, local tests and stubs, is partial. The feedback that tells the truth traditionally lives in CI and shared staging, where cycles are slow and contended. Cloud-native teams get forced to choose between a fast loop that converges on the wrong target and a truthful loop that iterates at pipeline speed.

I have written before about why the traditional environment options fall short at agent scale. The conclusion that matters here: in cloud-native systems, the loop’s feedback has to come from a runtime, and that runtime has to be reachable from the inner loop. The architecture of a cloud-native agent loop is mostly the architecture of the verification surface you give it.

The four layers of a cloud-native agent loop

The complete system has four layers. The loop itself is the smallest part.

The runtime layer. The loop needs an environment per iteration that behaves like production without costing like production. The answer is lightweight ephemeral environments on a shared cluster: deploy only the services the change touches, and use request routing on one shared cluster to steer the loop’s traffic through them and through a live shared baseline for everything else.

Environments materialize in seconds, marginal cost tracks the changed pods rather than the whole stack, and the feedback comes from real dependencies and real data paths. This is what moves where the loop closes into the inner loop.

The verification interface. Agents do not click through dashboards, and they should not invent their own definition of done. The checks a change must pass belong in declarative workflows that platform teams define, version, and expose to agents as the sanctioned way to prove a change.

The organization, not the agent, decides what passing means. The evidence attached to a change comes from a process humans can audit, and human review can concentrate on intent and design.

The feedback layer. This is the convergence engine. A pass or fail bit tells the loop to retry but not what to change. The runtime needs to hand back structured results: which check failed, the logs and traces scoped to the loop’s own requests, the behavior diff at the boundary that broke.

Every increment of feedback precision removes guesswork, and removed guesswork is removed cost.

The control layer. Budgets, iteration ceilings, no-progress detection, and a durable record of what each loop ran and proved. This is what lets an organization run many loops with confidence instead of one loop with anxiety.

When spend is bounded and convergence is measured, agentic development becomes a capacity you can plan rather than a bill you discover.

“Write loops, not prompts” is the visible tip of a larger claim: the team’s leverage now lives in the verification system the loops share, and that system is owned by the platform organization.

Where to start

The shift to loop-based development will not arrive as a single decision. It will arrive as one team’s experiment that converges fast and another’s that burns a quarter’s budget producing changes nobody trusts. The difference will be the four layers, not the cleverness of the loops.

For cloud-native teams, the runtime layer is the place to start, because every other layer depends on it. Verification workflows are only as meaningful as the environment they execute in, and feedback is only as truthful as the system that generates it.

That layer is what we build at Signadot: Kubernetes-native lightweight ephemeral environments and governed validation workflows that let agent loops prove changes against the real system, in seconds, in the inner loop. Check out our docs here to see how it works. 

If your agents are writing code faster than your team can trust it, the loop is already telling you which layer is missing.

YOUTUBE.COM/THENEWSTACK

Tech moves fast, don't miss an episode. Subscribe to our YouTube channel to stream all our podcasts, interviews, demos, and more.

Created with Sketch.