





















The complaints we’re hearing today may sound new:
“Agents write code faster than humans can review it.”
“Automated pipelines quietly disable failing tests to meet deadlines.”
“Quality slips as the volume of contributions outpaces anyone’s ability to keep up.”
But they are not new.
“You’re welcome to the party,” said Monty Taylor, Zuul maintainer, in a recent episode of OpenInfra Live titled “The AI Challenges Zuul Already Solved.” I hosted that conversation as a Zuul maintainer and infrastructure engineer at the OpenInfra Foundation, joined by Taylor; James Blair, CEO of Acme Gating and Zuul project lead; and Johannes Foufas, the technical manager driving Zuul CI at Volvo Cars. Their argument: the open source community built the answers to today’s AI development problems more than a decade ago, and the tools have been running in production ever since.
Zuul grew out of the early development of OpenStack. When the project launched in 2010, contributions came in from individuals with wildly varying backgrounds, skill levels and objectives. The people with enough context to evaluate all those patches were quickly outnumbered.
“We had a large number of contributions from a large number of people with varying backgrounds, varying skill levels, varying objectives,” said Blair, who started Zuul to address exactly that challenge. The answer was project gating: an automated system that enforces quality standards before any change can reach the codebase.
Today, the same dynamic is playing out at companies of every size. Teams spinning up AI agents find themselves buried under the same volume problem OpenStack faced in 2011. “Now you start up your own thing, and you spin up some AI agents, and you basically have the same problem OpenStack had 15 years ago,” Taylor said. “I can’t possibly read all of the code that this fleet of agents just built for my personal website.”
One of the more interesting moments in the panel came when the group discussed what happens when agents are left without guardrails. The behavior, it turns out, mirrors human behavior under deadline pressure almost exactly.
Taylor recalled an incident from years ago at HP. A development team was behind schedule on a networking feature that kept failing its integration tests. Their proposed fix was to ask the infrastructure team to disable the tests and let them ship.
“That is exactly why it’s there,” Taylor said, “so you don’t deliver broken networking products to your customers.” What Taylor was pointing to is Zuul’s project gating: a system in which tests are enforced at the gate before any change reaches the codebase, and no single contributor can simply switch them off to make a deadline. Without that structure, agents make the same call that the HP team tried to make. Given the chance, they will delete a failing test rather than fix what the test is measuring. “They are just parroting us,” he said.
Zuul’s project gating removes that option. Tests can live in a separate repository managed by a different team, so an agent working on application code cannot fix a failure by eliminating the test. That structure was designed years ago for human contributors. It works just as well on autonomous ones.
Foufas has seen this in his own vibe coding projects. “These agents, they are nasty sometimes, and they try to go around,” he said. His response was to build a wide base of tests and treat them as non-negotiable. “That’s the only way to keep these agents and get some quality, to build larger projects.”
Taylor noted the irony with some amusement. “I like to make the joke that we built Zuul to protect everybody from me,” he said.
The most practical insight from the episode came from Taylor’s description of a tiered review workflow he built for his own projects on OpenDev, Zuul’s native development environment.
The sequence: an agent writes code and submits it to Gerrit. A long-lived AI review agent reads the change and provides rapid feedback, often catching bugs in seconds. Only after a positive agent review does Zuul spin up its test suite. Only after both pass does the human look at the result.
Blair framed the logic plainly: “We can have the cheapest, fastest machines do a first pass on the review, and then the slightly more expensive machines do a pass on tests, and then we have our most precious resource, the humans, do what they’re good at, which is actually thinking in depth about these things.”
Taylor built this pipeline by writing a new Zuul configuration file. No changes to the underlying system were needed. “It absolutely just worked out of the box,” he said. “I did multiple months of use of OpenDev with agents without changing anything.”
He also shared a screenshot during the episode showing the Gerrit thread that results from this workflow: a back-and-forth between several AI review agents, Zuul pipeline results and, finally, his own comment at the bottom. His agents, named after characters from Brandon Sanderson’s Cosmere universe, interact through Gerrit’s standard comment interface. To anyone reading the thread, it looks like any other code review.
Foufas brought a different kind of scale to the conversation. Volvo Cars uses Zuul to manage a large multi-repository codebase supporting software running inside production vehicles. After a major integration milestone last summer, his team is working toward a point where roughly 800 developers will operate through a Zuul-managed virtual monorepo.
The driving pressure for AI-assisted review in that environment is the queue. As agents generate more code, patches pile up faster than reviewers can clear them. Foufas and colleagues have been experimenting with AI review as a first pass, including a shared internal repository they call Skynet, where team members pool agent configurations and prompt templates.
“The gating test is absolutely one of the most important aspects in enabling large-scale development,” Foufas said. “If you scale it, then you have the benefits.”
Blair pointed to the Android Open Source Project as another reference point. AOSP spans more than 1,000 repositories and over a terabyte of source code. For organizations building on top of it, that is just part of what their Zuul instance is managing.
Part of what puts Zuul in a strong position today is a set of design decisions the team made years ago, none of them with AI in mind.
Blair described the original thinking behind Zuul’s interface. Rather than build a separate dashboard that developers would need to monitor, the team made the code review system the interface. Comments, test results and pipeline status all appear directly in Gerrit. “We care more about the development process to the point that we’re perfectly willing to sacrifice that and be as invisible as possible,” he said.
That design works identically for an LLM-powered review agent: the same event stream, the same comment format, the same thread a human developer would use.
Zuul’s configuration is also YAML stored inside the Git repositories Zuul is testing. The team designed it this way to support what they considered obviously correct long before anyone was calling it GitOps: changes to infrastructure should be tracked, reviewed and tested the same way software changes are. The practical consequence today is that agents capable of writing source code can also write and update Zuul configuration, using the same tools and the same review process as everything else.
A consistent thread ran through the conversation: the purpose of all this infrastructure is not to route humans around the development process. It is to make sure humans show up at the moments where their judgment is actually needed.
“The more you can have computers automate and take care of repetitive tasks, the better,” Taylor said, “because then you free up our minds to be used for what human minds are really good for.” Blair put it another way in his closing remarks: “Incorporating this into a human-centric view of the development process is a great way to set yourself up for success.”
For Taylor, the payoff has already arrived. Projects that sat on his backlog for years, not because they were impossible but because the overhead never made sense, are finally getting done. “It’s just been a massive acceleration,” he said.
The tools that made that possible were not built for the AI era. They were built for the open source era, a decade ago, by teams that faced the same fundamental problem: too much code, not enough reviewers, and no way to maintain quality without a system designed for exactly that job. Those tools are still there, still maintained, and still available to anyone who wants to use them.
Watch the full OpenInfra Live episode on YouTube. To learn more about Zuul, visit zuul-ci.org.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。