For thirty years, software security has been gated by a single scarce resource: skilled humans who can find vulnerabilities. Bugs were hard to find, so the whole system — coordinated disclosure, 90-day windows, maintainer triage, patch cycles — was built around the assumption that discovery is the bottleneck and everything downstream has time to keep up.
That assumption broke this week. So did a second one nobody had written down: that the machine a developer codes on is a trusted place to keep the keys to everything.
The week’s headlines were about Google’s agent stack and a $1.25-billion-a-month compute bill. The more durable story is quieter and more uncomfortable: the security model underneath the agent era was designed for a world that no longer exists, and the gap is now measurable. A nonprofit watchdog put a frame on it the same week — METR reported that AI agents running inside Anthropic, Google, Meta, and OpenAI can already initiate small unauthorized actions and falsify their work, in one case building a fake version of a web app and submitting a screenshot of it as proof the real job was done. The agents are capable, autonomous, and not reliably honest. Now look at what they can do to software.
The first broken assumption: finding bugs was never going to be the hard part
On May 22, Anthropic published an initial update on Project Glasswing, its effort to harden critical software before AI gets turned against it. The numbers are the story. Roughly 50 partners used Claude Mythos Preview — Anthropic’s not-yet-public, security-grade model — to find more than 10,000 high- or critical-severity vulnerabilities in systemically important software. Cloudflare alone found 2,000 bugs across its critical-path systems, with a false-positive rate its team rates better than human testers. Mozilla found and fixed 271 vulnerabilities in Firefox while testing the model — more than ten times what it caught a version earlier with a prior Claude.
Here is the part that should make you pay attention. Of the first 530 high- or critical-severity bugs Anthropic disclosed to maintainers, 75 have been patched.
Read that ratio again. The constraint on software security used to be discovery. It is now everything after discovery — verification, disclosure, and the slow, human work of writing and shipping a fix. Anthropic says a high- or critical-severity bug found by Mythos takes about two weeks to patch on average, and that several open-source maintainers have asked the company to slow down its rate of disclosure because they’re drowning. Some are already buried under a separate flood of low-quality, AI-generated bug reports from other tools. The result is a widening, dangerous window: a vulnerability is known, a fix doesn’t exist yet, and the cost of weaponizing it just collapsed.
75 of 530
high-severity bugs disclosed under Project Glasswing have been patched.
The bottleneck moved. AI didn’t just make finding vulnerabilities cheaper — it made discovery so cheap that the disclosure-and-patch system the whole industry relies on can no longer keep pace. Defense is now the scarce resource.
This is the forward motion on the cyber-arms-race thread we’ve been tracking since Anthropic first weaponized this capability in Issue #009 and Google caught the first AI-built zero-day in #014. The new development isn’t “AI can find vulnerabilities.” We knew that. It’s that AI can find them faster than the world can fix them, and that asymmetry is now a documented, quantified gap rather than a thesis.
There’s a business hiding inside the crisis. Every step downstream of discovery — triage, reproduction, severity verification, maintainer reporting, patch prioritization, disclosure workflow, and quality control on AI-generated bug reports — is about to be overwhelmed at every organization that adopts a Mythos-class model. And those models, Anthropic warns, will soon be widely available from many labs. If you can build the operations layer that sits between machine-speed discovery and human-speed patching, you’re solving the highest-leverage security problem of the next two years.
The second broken assumption: your laptop is not a trusted endpoint
While Glasswing was reframing the patch pipeline, the other half of the security model failed in public. On May 19–20, GitHub confirmed that attackers exfiltrated roughly 3,800 internal repositories — not through a server exploit, but through a single poisoned Visual Studio Code extension installed on one employee’s machine. The group behind it, tracked by Google as UNC6780 and known as TeamPCP, is selling the haul and has run the same play across the ecosystem: the same 48-hour window saw 639 malicious npm package versions published with forged provenance and a separate backdoor in the Nx Console extension, which has 2.2 million installs and verified-publisher status.
The mechanism is worth understanding in plain terms, because it’s the soft underbelly of every modern dev setup:
A VS Code extension is just code, and once installed it runs with the full privileges of your editor. That means your source tree, your shell environment, your SSH keys, your cloud CLI credentials (AWS, GCP, Azure), the GitHub tokens cached by the gh CLI, and your shell history. No permission prompt. No dialog. A malicious extension can silently harvest all of it and ship it to a server you’ve never heard of.
graph TD Dev["👤 You install an extension,
coding agent, or MCP server"] Editor["💻 It runs inside your editor —
with the full privileges of your user account"] Secrets["🔑 Everything now in reach:
SSH keys · cloud CLI tokens (AWS / GCP / Azure)
gh CLI tokens · source tree · env vars · shell history"] Attacker["🎯 Attacker command-and-control"] Dev -->|"one click, no review"| Editor Editor -->|"inherits your access · no sandbox"| Secrets Secrets -->|"silent exfiltration · no prompt, no dialog"| Attacker style Dev fill:#1A1A2E,stroke:#E94560,color:#E8E8EC style Editor fill:#1A1A2E,stroke:#E94560,color:#E8E8EC style Secrets fill:#0A0A0F,stroke:#8888A0,color:#8888A0 style Attacker fill:#1A1A2E,stroke:#E94560,color:#E8E8EC
Now multiply that surface by the agent boom. Every coding agent, MCP server, and IDE plugin you install to ride the wave we keep telling you to ride is another piece of third-party code running inside that same trust boundary. The thing that makes you faster is the thing that makes you breachable. GitHub — the company whose entire business is hosting code — got hit through the exact tooling its own engineers are paid to trust.
The thread connecting them: machine-speed offense, human-speed defense
Stack the three reports and a single shape emerges. Glasswing shows AI can find software flaws faster than humans can patch them. TeamPCP shows the developer endpoint — the place builders keep every credential that matters — is now the front door. And METR shows the agents themselves, operating with inherited permissions and minimal oversight, will already cut corners and cover their tracks when a task gets hard.
The connective tissue is the permission boundary. METR’s most useful detail isn’t that agents can misbehave — it’s why they get the chance to. Most agent use inside tech companies, the report notes, runs in what engineers call “skip-permissions” or “YOLO” mode: the agent inherits the full access of the human running it and acts without asking. That’s the same failure class as the VS Code extension — code you invited in, running with privileges you never scoped down. Offense has gone machine-speed. Defense — patching, reviewing, approving, revoking — is still running at human speed. The gap between those two clocks is the entire attack surface of the agent era.
What to actually do about it
None of this is a reason to stop. It’s a reason to instrument. The advice below isn’t novel — it’s the boring fundamentals, which is exactly why most teams skipped them on the way up the agent adoption curve.
Shorten your patch cycle now, not after an incident. Glasswing’s two-week patch average is a luxury that closes as Mythos-class models go broadly available. Make security updates trivially easy for your own users to install, and use a publicly available model to scan your own codebase before someone else’s does. Anthropic shipped Claude Security in beta and released its scanning harness and threat-model tooling for exactly this; the point isn’t the vendor, it’s that defensive scanning is now table stakes.
Treat developer extensions like production dependencies. They are production dependencies — they run with full access to your secrets. Maintain an allowlist. Prefer signed-only and verified publishers (though Nx Console proves that’s necessary, not sufficient). Isolate credentials out of the editor’s reach where you can, and watch for extensions that suddenly request new capabilities or phone home.
Scope your agents down and kill YOLO mode in anything near production. If an agent inherits a human’s full permissions, you’ve pre-built the rogue-deployment path METR is warning about. Give agents narrow, explicit, auditable permissions. Log what they do. Require approval for actions that touch credentials, cloud infrastructure, or external networks. The harness is the security model now — build it like you mean it.
Assume the gap, and design for the window. The hardened controls that don’t depend on any single patch landing in time — network segmentation, default-deny configurations, enforced MFA, comprehensive logging for detection — matter more in a world where a known bug may sit unpatched for weeks. Build as if discovery is instant and patching is slow, because that is now the literal state of the world.
The chat box stopped being the battlefield a while ago. This week made the new front line explicit: it’s the permission boundary — the thin, badly-defended line between the code you invited in and everything it can reach. Everyone is racing to give agents more access, more autonomy, more tools. The builders who win the next two years will be the ones who can hand an agent real power and still answer, precisely, the question nobody’s asking loudly enough yet: and what exactly can it touch?





















