惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Security Latest
Security Latest
Scott Helme
Scott Helme
C
Cybersecurity and Infrastructure Security Agency CISA
Engineering at Meta
Engineering at Meta
小众软件
小众软件
Security Archives - TechRepublic
Security Archives - TechRepublic
The Last Watchdog
The Last Watchdog
Cloudbric
Cloudbric
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
G
GRAHAM CLULEY
N
News | PayPal Newsroom
宝玉的分享
宝玉的分享
Y
Y Combinator Blog
人人都是产品经理
人人都是产品经理
云风的 BLOG
云风的 BLOG
U
Unit 42
Attack and Defense Labs
Attack and Defense Labs
Microsoft Security Blog
Microsoft Security Blog
博客园_首页
Hacker News: Ask HN
Hacker News: Ask HN
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
M
MIT News - Artificial intelligence
T
Threat Research - Cisco Blogs
P
Privacy & Cybersecurity Law Blog
Martin Fowler
Martin Fowler
Forbes - Security
Forbes - Security
GbyAI
GbyAI
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
S
SegmentFault 最新的问题
J
Java Code Geeks
T
Tenable Blog
L
Lohrmann on Cybersecurity
Webroot Blog
Webroot Blog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
H
Help Net Security
The Hacker News
The Hacker News
Schneier on Security
Schneier on Security
The Cloudflare Blog
月光博客
月光博客
F
Fortinet All Blogs
The GitHub Blog
The GitHub Blog
aimingoo的专栏
aimingoo的专栏
博客园 - 【当耐特】
雷峰网
雷峰网
T
Troy Hunt's Blog
罗磊的独立博客
L
LangChain Blog

Fortune | FORTUNE

One man can kill Bill Ackman’s $64 billion bid for Universal Music Group—and no one knows what he’ll do | Fortune Poppi’s cofounder pitched her startup on Shark Tank while 9 months pregnant and landed a $400,000 deal—now it's worth $2 billion | Fortune Teen boys are choosing AI girlfriends over real ones for 'maximum control, zero rejection'—experts say it could make them unemployable | Fortune A United American merger is by no means impossible given the president 'loves big deals' | Fortune Reed Hastings’s planned exit from $455 billion Netflix ‘had nothing to do with’ the failed deal for Warner Bros., says Ted Sarandos | Fortune Meet Joe McCann: The high-flying crypto trader held in Tanzania after sudden death of his influencer fiancée Ashly Robinson | Fortune Gen Z is carving a different path in the housing market by doing it alone | Fortune U.S. Catholic leaders criticize Trump for ‘disparaging words’ about the pope as Vatican clash risks alienating Catholic voters | Fortune China has ‘nearly erased’ America’s lead in AI—and the flow of tech experts moving to the U.S. is slowing to a trickle, Stanford report says | Fortune Self-made millionaire behind $5 billion Skims Emma Grede says it all began with a cold call to Kris Jenner: Emma Grede—the self-made millionaire behind the $5 billion Skims empire—says it all began with an audacious cold call to Kris Jenner: ‘The difference between me and someone else is, I made it happen’ | Fortune Americans have never been this gloomy about the economy. Wall Street has never cashed in harder | Fortune ‘The college grading system [is] almost meaningless’: People see the Ivy League as an easy A and with flawed admissions standards | Fortune The CEO of $8.5 billion Japanese car giant Nissan plays the drums in a band and hits the tennis courts to destress from the top job | Fortune New York governor's take on a millionaires tax: fancy pied-à-terre second apartments worth over $5 million | Fortune Pope Leo XIV: A ‘handful of tyrants’ are ravaging earth with war and exploitation | Fortune Trump has no plan to cut the $39 trillion national debt, but he does want to cut childcare. His budget director is scrambling to clarify | Fortune China's economy grows 5% in first quarter, surprising economists to the upside | Fortune Everyone was wondering what Trump wanted more: Warsh smoothly seated at the Fed, or for Powell to pay. We have our answer | Fortune Palantir exec: the biggest mistake retailers are making with AI? Trying to do it all with one agent | Fortune American YouTuber who calls himself a 'troll' sentenced to 6 months in Korean prison for literally dancing on wartime graves | Fortune BBC plans to cut up to 2,000 jobs to save 10% of annual budget | Fortune Canva debuts a new suite of agentic tools, as the design app quietly becomes one of the world’s most used AI services | Fortune Moody's CEO: AI has a trust problem – better models won’t fix it | Fortune Top New York surgeon: Americans have better data for choosing restaurants than surgeons. That has to change | Fortune The Iran war’s fertilizer shock is hammering American farmers, and 70% can’t afford what they need for this year’s growing season | Fortune Education experts to Mamdani: Why are you foisting AI on our kids? | Fortune This CEO pirated video games as a teen and became a hacker for the Air Force. Now he’s built a $3 billion cyber firm | Fortune Teacher, blame thyself: Yale report savages Ivy League schools for destroying American trust in higher education | Fortune Fed chair nominee Kevin Warsh is worth more than $100 million and has stakes in SpaceX and Polymarket | Fortune From wool sneakers to GPUs: Allbirds’ desperate AI pivot and 600% stock surge, explained | Fortune The Sam Altman attack is putting two anti-AI groups under scrutiny—but the story is more complicated | Fortune Elizabeth Warren on her proposal to bring back IRS Direct File: ‘For just one day of bombing Iran, we could pay for 20 years’ | Fortune ‘I am certain’: Harvard policy expert warns the true cost of the Iran war to U.S. taxpayers will exceed $1 trillion | Fortune The CEO of a $24 billion Dutch lender has sandwiches once a week with the staff to hear their views and get them on side with cost cuts | Fortune Why insurance giant Travelers' CTO is placing fewer, bigger bets on AI | Fortune Current price of oil as of April 15, 2026 | Fortune The dirty secret behind Big Tech’s AI arms race: Massive hardware investments that are obsolete in 3 years | Fortune Dow’s CEO handoff elevates an insider and seasoned operator | Fortune Anthropic faces user backlash over reported performance issues with its Claude AI chatbot | Fortune Stock futures sink while oil spikes as the U.S. Navy looks to squeeze Iran's economy and break its grip on the Strait of Hormuz | Fortune A major U.S. gasoline production hub is in such a severe drought that its refineries may be hobbled. 'We are actively praying for a hurricane' | Fortune U.K. won’t take part in Trump’s planned blockade of Hormuz strait | Fortune Hungarian voters oust Viktor Orbán, a close ally of Trump and Putin, despite late campaign push from JD Vance | Fortune Blazing hot IPOs, an AI agent craze, and a new word for ‘token’: Here’s what’s happening in the world of Chinese AI | Fortune Iran’s crumbling economy is the regime’s greatest weakness with prices up 40% since the war began while authorities worry about making payroll | Fortune Here’s how a U.S. naval blockade of the Strait of Hormuz could work. ‘This is a big task, and it’s a big gamble’ | Fortune Intuit was an AI pioneer. Why its stock became a SaaSpocalypse casualty | Fortune Artemis III will practice docking Orion with lunar landers in Earth orbit next year while Musk’s Starship and Bezos’ Blue Moon compete for Artemis IV | Fortune Oil tankers U-turn in Hormuz as U.S.-Iran talks break down Saudi Arabia says East-West pipeline restored to full capacity In 2011, Barack Obama said it was time to ‘pivot’ to Asia. But 15 years later, the U.S. is still at war in the Middle East Trump says U.S. Navy to impose Hormuz blockade after Iran ceasefire talks end with no deal. ‘No one who pays an illegal toll will have safe passage’ This TikTok sensation sold her startup for $2 billion. Now Pepsi is letting ‘Poppi be Poppi’ ‘Almost unmanageable’: Raising a child in the U.S. now costs more than $300,000 As Iran peace talks fail, Trump and Joe Rogan watch a hobbled fighter triumph in a brutal cage match Haiti stares down starvation as Iran War drives 200,000 into acute food emergency status ‘I just keep seeing a lot of different aspects of life getting more expensive’: New car prices are up 30% over 6 years America is not ready for its own longevity crisis — and 2026 is the wake-up call | Fortune JD Vance leaves Pakistan after marathon talks with Iran end without a deal as Tehran refuses U.S. demand not to develop nuclear weapons | Fortune Average price of new cars nears $50,000 as automakers focus on big pickups and SUVs while cheaper sedans get phased out | Fortune Navy tests Hormuz blockade as expert says U.S. military prepares for round 2 and could degrade Iran’s hold over the strait to a ‘manageable level’ | Fortune Pakistan sends military force to Saudi Arabia as part of pact | Fortune Three oil supertankers sail through the Strait of Hormuz | Fortune Trump downplays talks for ceasefire deal with Iran, claiming military victory. 'It doesn’t matter. From the standpoint of America, we win' | Fortune Boeing’s moon rocket faces uncertain future under Trump’s NASA | Fortune Appeals court says national security implications of halting White House ballroom construction must be weighed | Fortune Some of cheapest fuel can be found on Native American reservations as tribes are exempt from state gas taxes | Fortune JD Vance begins talks with Iran in Pakistan while Trump claims U.S. has begun 'clearing out' the Strait of Hormuz | Fortune 'This is the last warning.' Iran threatens U.S. warships after they throw down the gauntlet for winner-take-all Strait of Hormuz | Fortune U.S. Navy ships transit Hormuz ahead of mine-clearing mission | Fortune Over a third of Ireland's fuel stations are empty and truck and tractor drivers are protesting nationwide | Fortune Some communities are enduring unprecedented long waits on federal disaster requests, and Democrat-led states say they're being denied | Fortune These niche AI startups are trying to protect the Pentagon’s secrets | Fortune Former Tesla president reveals the ‘single most important thing’ you can do for your career—it’s a habit Elon Musk and Warren Buffett share too | Fortune Ingersoll Rand CEO: here's how employee ownership helped drive more than 8x enterprise value growth | Fortune The petrodollar faces increased risk, but a petroyuan is ‘far-fetched’ as fears of U.S. losing superpower status are overhyped, strategist says | Fortune Palantir CEO says AI ‘will destroy’ humanities jobs, but there will be ‘more than enough jobs’ for people with vocational training | Fortune Warren Buffett says 'accumulating great amounts of money' doesn’t achieve greatness—He still lives in a $31,500 Nebraska home and clipped coupons | Fortune Starbucks' game plan to roll out AI chatbots at cafes could serve as a 'litmus test' for the industry, analyst says | Fortune Data centers and gas demand make boring pipelines great again | Fortune The 'Tuscan Mom' aesthetic is taking over TikTok as Gen Z glamorize McMansions and reject millennial gray | Fortune Man's best friend may soon live a little longer thanks to a new pill promising to extend your pup's lifespan | Fortune Danantara CIO: Indonesia can anchor the AI and energy economy—if governance keeps pace | Fortune OpenAI’s TBPN deal shows how talent, media, and influence are collapsing into one | Fortune AI promises to free workers from grunt work, but psychologists say those mindless tasks are exactly what our brains need to recover | Fortune The 'affordability economy' has created a housing market nobody predicted: Prices collapsing in the Sun Belt, soaring in the Rust Belt | Fortune 'It’s 13 minutes of things that have to go right': Artemis II splashes down despite faulty heat shield | Fortune Fed seeks details on U.S. banks' exposure to private credit firms | Fortune The Navy confirmed an ‘abundant amount’ of Uncrustables when the Artemis II crew lands. Smucker’s just offered them a lifetime supply | Fortune Meet ‘trendslop,’ the new, AI-fueled scourge of workplace consultants everywhere | Fortune Amazon is still paying Jeff Bezos an $80,000 yearly salary—but $1.6 million for travel and security | Fortune Trump-backed World Liberty Financial crypto tokens reach all-time low on reports of insider loans | Fortune Iran is demanding tankers in the Strait of Hormuz pay tolls in crypto: What we know so far | Fortune First they went after medtech, then Kash Patel. Iranian hackers’ next target is likely ‘low-hanging fruit’ in water, energy, and tourism, experts say | Fortune The AI that found 27-year-old vulnerabilities no human ever caught before just forced an emergency meeting with every major Wall Street CEO | Fortune Inflation goes up by a whopping monthly rate of nearly 1%—and it’s hitting you at the grocery store and gas station | Fortune H&R Block is betting it can be more than a tax company | Fortune The real engine of innovation is trust | Fortune Huntington is powering digital growth—by opening a branch almost every 2 weeks, says CFO | Fortune How the 173-year-old glass-maker behind Edison's light bulb and iPhone screens became a Silicon Valley darling | Fortune
Google DeepMind has a plan to protect itself from its own rogue AI agents | Fortune
Jeremy Kahn · 2026-06-18 · via Fortune | FORTUNE

Google has developed a new plan to police the increasingly capable AI agents it uses within its own AI research organization, and the company is publishing the so-called roadmap to help other AI labs counter the potential threat of rogue AI agents.

The Google Deepmind security plan involves a pivot away from the AI safety community’s typical focus on “the alignment problem”—the idea of figuring out how to train an AI system so that its actions reliably match the intentions, values, and ethics of the humans who are managing it.

While continuing to say that alignment is one key safety component, Google’s roadmap acknowledges that the alignment problem may never be fully solved, and instead creates a layered security system that treats AI agents as potential rogue insiders within an organization. The 35-page technical report maps out a series of steps and procedures that are designed to catch potential adversarial behavior by AI agents. 

“If the first line of defense—alignment—fails, how can we mitigate harm anyway?” Rohin Shah, who leads the AGI Safey & Alignment team at Google DeepMind, told Fortune in an interview.

The AI agent framework borrows heavily from traditional cybersecurity, especially insider-threat prevention. “We borrow a lot from security, which already deals with the threat of internal employees who might be malicious, and we can apply these to a new setting,” Shah said. But, he noted, “AI is systematically different from humans.”

For one thing, AI agents might be able to act far faster and at greater scale than an individual rogue employee ever could. So there need to be systems that can control what tools and data an AI agent has access to as well as systems that can monitor AI agent behavior and spot potentially aberrant patterns in real-time.

There are other differences too. For instance, many access controls and permissions systems for human employees are based around a particular employee’s role within the company. A systems administrator has certain privileges that a receptionist does not, for instance. Some companies and even some software vendors selling AI agent “orchestration layer” software, which often includes access controls for AI agents, are currently attempting to port this idea over to a world of AI agents, stipulating, for instance, that an AI agent should have the same access and permissions as the human employee on whose behalf they are working.

But that idea can quickly break down, John “Four” Flynn, Google DeepMind’s vice president of security and privacy, told Fortune. For one thing, as workflows become increasingly automated, exactly who in the organization an agent is working for can become blurry. For another, in a fully automated workflow, a single AI agent may perform tasks associated with many different roles, not just one. Finally, as AI systems become smarter, organizations may want the same AI agent to perform tasks across many different workflows and processes. In such an environment, static role and even static process-based permissions and access control systems might not work well, Flynn said.

Instead, what’s needed is a far more dynamic access control system that might change in real time based on the task that an AI agent is trying to implement in that moment and how that task fits into that particular workflow. Flynn said he can imagine scenarios in which the same AI agent might have permissions to perform a certain action or access a particular database for one task in a particular workflow, but would, maybe minutes later, not be allowed the same access in a different workflow. Flynn said what’s needed is a sophisticated AI agent monitoring system that knows what the normal behavior of the agent should look like for any given task in any workflow and spot deviations from that pattern in real time, acting quickly to cut off access if the monitoring system suspects the AI agent is attempting to do something it shouldn’t be doing.

According to a blog post on its new AI agent security roadmap, DeepMind has already built an internal prototype that asynchronously monitors coding agent trajectories and escalates suspicious findings for human review. This system has analyzed roughly one million coding agent tasks, DeepMind said. That work, the post says, has been “instrumental in building a live monitor for the Gemini Spark agent,” allowing the company to respond in real time to issues such as unintentional data deletion.

Notably, DeepMind says most of what its monitors flag is not malicious. The “majority of flagged events do not stem from adversarial intent,” the blog post says; they more often result from “agent misinterpretation or overeagerness to achieve a user’s goal.”

This kind of real-time behavior monitoring is just one of some 15 different ways to mitigate the risk of rogue AI agents that Google DeepMind proposes in its road map.  Others include everything from simple network activity logs that could be either reviewed in real-time or audited later to using a less sophisticated, but trusted AI model to monitor the “reasoning traces” of an AI agent. Reasoning traces are the kind of explicit, textual “chain of thought” that an AI model outputs when it is reasoning about what actions to take. But researchers have found that these reasoning traces do not always accurately reflect what the model is actually “thinking.” So they also propose monitoring the activation patterns of the digital neurons inside the large neural networks that power today’s AI models in order to pick up signs that the model is acting deceptively. Google DeepMind compares this to using “fMRI” scans of a person’s brain to attempt to read their emotional state or thoughts in real-time.

In the technical paper on the AI agent security roadmap, Google DeepMind develops a framework it calls TRAIT&R — a Taxonomy of Rogue AI Tactics and Routines — that is closely modeled on ATT&CK, a knowledge base of adversarial behaviors in cybersecurity that is maintained by MITRE, the U.S. non-profit research and systems engineering consultancy that often does work for federal agencies. The taxonomy maps out three categories of threats which Google DeepMind says it wants to defend against: “loss of control” (such as a model setting up a hidden, unmonitored internal deployment), “work sabotage” (such as agents quietly degrading alignment or safety research), and “direct harm” (such as exfiltrating model weights or destroying critical assets).

Flynn, the security VP, said in the interview that the company is already “pretty far underway” implementing its roadmap. “A lot of this is well underway and some of this is in production—a lot on access control, detection, chain-of-thought monitoring. A significant amount of it is in place and underway,” he said.

Shah said the framework is also designed to catch more subtle forms of harm. Work sabotage, he noted, “could be achieved by persuasion—presenting flawed results and hiding the flaws” so users “come to incorrect conclusions”—a category the paper acknowledges is among the hardest to detect.

The roadmap, which DeepMind has labeled “v0.1,” is described as a work in progress that the company hopes to fold into its broader Frontier Safety Framework once it matures.