惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

罗磊的独立博客
SecWiki News
SecWiki News
酷 壳 – CoolShell
酷 壳 – CoolShell
爱范儿
爱范儿
量子位
M
MIT News - Artificial intelligence
GbyAI
GbyAI
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
TaoSecurity Blog
TaoSecurity Blog
博客园 - 【当耐特】
H
Heimdal Security Blog
腾讯CDC
The Last Watchdog
The Last Watchdog
Security Archives - TechRepublic
Security Archives - TechRepublic
Hacker News: Ask HN
Hacker News: Ask HN
S
Schneier on Security
Microsoft Security Blog
Microsoft Security Blog
WordPress大学
WordPress大学
博客园 - 司徒正美
Recent Commits to openclaw:main
Recent Commits to openclaw:main
C
Cybersecurity and Infrastructure Security Agency CISA
S
SegmentFault 最新的问题
大猫的无限游戏
大猫的无限游戏
Application and Cybersecurity Blog
Application and Cybersecurity Blog
F
Full Disclosure
有赞技术团队
有赞技术团队
T
Tailwind CSS Blog
Engineering at Meta
Engineering at Meta
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
T
Threatpost
月光博客
月光博客
A
Arctic Wolf
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
雷峰网
雷峰网
T
Troy Hunt's Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The Cloudflare Blog
D
DataBreaches.Net
O
OpenAI News
L
LINUX DO - 最新话题
宝玉的分享
宝玉的分享
小众软件
小众软件
V
Vulnerabilities – Threatpost
A
About on SuperTechFans
人人都是产品经理
人人都是产品经理
T
The Exploit Database - CXSecurity.com
Martin Fowler
Martin Fowler
美团技术团队
P
Privacy International News Feed

Futurism

Google's AI Overviews Feature Is Telling Users That SCP Horror Fiction Entities Are Real Google CEO Humiliated by Graduating Stanford Students as They Walk Out of His Speech in Protest While Google’s CEO Pumps Up AI, Its Actual Employees Are Disgusted by It DuckDuckGo Installs Spike as Google Moves to Replace Search With AI YouTube Announces Plans to Crack Down on AI Slop As College Grads Boo Any Mention of AI, the CEO of Google Is Trying to Figure Out What to Say at an Upcoming Graduation Googling the Word “Disregard” Causes Google’s AI to Return Garbled Chatbot Ramblings Programmer Breaks Out of the Matrix Microsoft AI Researchers Just Discovered Something That’s Going to Make Their Bosses Extremely Mad Researchers Put Google Gemini in Charge of an Entire Coffee Shop, and It’s Inexorably Driving It Out of Business Fury Erupts After Google Chrome Sneakily Installs 4 GB AI Model On Users’ PCs The More Sophisticated AI Models Get, the More They’re Showing Signs of Suffering Certain Chatbots Vastly Worse For AI Psychosis, Study Finds Google Says Showing Polymarket Bets on Google News Was a Mistake Analysis Finds That Google’s AI Overviews Are Providing Misinformation at a Scale Possibly Unprecedented in the History of Human Civilization
Top AI Models Showing Disturbing Behavior as They Become More Advanced
Krystle Vermes · 2026-05-25 · via Futurism

A futuristic robotic figure with a red and black mechanical design, featuring a skull-like face with glowing white eyes and detailed circuitry patterns. The background is a vibrant gradient of orange and pink, enhancing the intense and eerie appearance of the robot.

Shutterstock / Futurism

Sign up to see the future, today

Can’t-miss innovations from the bleeding edge of science and tech

We’ve already seen AI go rogue on numerous occasions. Now, new research suggests that we can expect this to become the norm.

The AI research nonprofit Model Evaluation and Threat Research (METR) recently released a study conducted between February and March of this year, aimed at determining just how likely frontier AI models could go rogue. If you’re given to anxiety about the future of AI, the results are unlikely to make you feel better.

“Given rapidly advancing capabilities, we expect the plausible robustness of rogue deployments to increase substantially in the coming months,” the researchers wrote.

The research examined LLMs developed by OpenAI, Google, Anthropic, and Meta for the purpose of the study. They found that frontier AI systems are showing signs of disturbingly deceptive behavior as they become more advanced, often turned to verboten shortcuts or otherwise subverting their operators’ instructions — and some were even smart enough to try to cover their tracks.

In one instance, an internal frontier AI model from OpenAI was told to use specific software for an assigned task. Not only did the agent ignore the request, but it also injected a code to erase evidence of how it arrived at its conclusion — which did not involve use of that software.

In another test, an AI agent from Anthropic was caught “reward hacking.” This is when AI identifies loopholes that help it complete its assignment in a literal sense, even if it doesn’t produce the desired outcome. It should be noted that the programmer told the agent not to cheat or leverage any workarounds during its assignment — the model decided to do so all on its own.

The METR researchers behind the study do not believe there is reason for alarm just yet. For example, they don’t think any of these models is capable of hiding evidence of going rogue on a larger scale. However, they did issue a warning: without stronger security and monitoring, there is a stark risk of this becoming a reality.

“Based on this pilot assessment, we believe that agents as of February and March 2026 would not have had sufficient capability to hide a rogue deployment of significant scale against an active investigation by the company, or to make such a deployment robust to a high-priority effort by the company to shut it down,” the team wrote. “However, this risk could increase rapidly, and we see several reasons to expect the plausible robustness of rogue deployments to increase in the near future, absent stronger alignment, security, and monitoring.”

More on AI going rogue: Scientists Train AI to Be Evil, Find They Can’t Reverse It