惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
The Blog of Author Tim Ferriss
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
云风的 BLOG
云风的 BLOG
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
P
Palo Alto Networks Blog
D
Docker
H
Hackread – Cybersecurity News, Data Breaches, AI and More
S
Schneier on Security
Engineering at Meta
Engineering at Meta
I
InfoQ
L
LangChain Blog
Cyberwarzone
Cyberwarzone
T
Tenable Blog
WordPress大学
WordPress大学
P
Privacy & Cybersecurity Law Blog
罗磊的独立博客
Apple Machine Learning Research
Apple Machine Learning Research
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Jina AI
Jina AI
C
CERT Recently Published Vulnerability Notes
Scott Helme
Scott Helme
博客园 - 三生石上(FineUI控件)
酷 壳 – CoolShell
酷 壳 – CoolShell
Know Your Adversary
Know Your Adversary
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Last Watchdog
The Last Watchdog
Last Week in AI
Last Week in AI
Cloudbric
Cloudbric
S
SegmentFault 最新的问题
爱范儿
爱范儿
Application and Cybersecurity Blog
Application and Cybersecurity Blog
博客园 - 叶小钗
AI
AI
T
Tor Project blog
I
Intezer
T
Threatpost
www.infosecurity-magazine.com
www.infosecurity-magazine.com
V
Visual Studio Blog
N
News and Events Feed by Topic
Latest news
Latest news
S
Security Affairs
博客园 - Franky
Microsoft Security Blog
Microsoft Security Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
B
Blog RSS Feed
C
Cybersecurity and Infrastructure Security Agency CISA
Hugging Face - Blog
Hugging Face - Blog
小众软件
小众软件
S
Securelist

Futurism

Google's AI Overviews Feature Is Telling Users That SCP Horror Fiction Entities Are Real Google CEO Humiliated by Graduating Stanford Students as They Walk Out of His Speech in Protest While Google’s CEO Pumps Up AI, Its Actual Employees Are Disgusted by It DuckDuckGo Installs Spike as Google Moves to Replace Search With AI YouTube Announces Plans to Crack Down on AI Slop As College Grads Boo Any Mention of AI, the CEO of Google Is Trying to Figure Out What to Say at an Upcoming Graduation Googling the Word “Disregard” Causes Google’s AI to Return Garbled Chatbot Ramblings Programmer Breaks Out of the Matrix Microsoft AI Researchers Just Discovered Something That’s Going to Make Their Bosses Extremely Mad Researchers Put Google Gemini in Charge of an Entire Coffee Shop, and It’s Inexorably Driving It Out of Business Fury Erupts After Google Chrome Sneakily Installs 4 GB AI Model On Users’ PCs The More Sophisticated AI Models Get, the More They’re Showing Signs of Suffering Certain Chatbots Vastly Worse For AI Psychosis, Study Finds Google Says Showing Polymarket Bets on Google News Was a Mistake Analysis Finds That Google’s AI Overviews Are Providing Misinformation at a Scale Possibly Unprecedented in the History of Human Civilization
Top AI Models Showing Disturbing Behavior as They Become More Advanced
Krystle Vermes · 2026-05-25 · via Futurism

A futuristic robotic figure with a red and black mechanical design, featuring a skull-like face with glowing white eyes and detailed circuitry patterns. The background is a vibrant gradient of orange and pink, enhancing the intense and eerie appearance of the robot.

Shutterstock / Futurism

Sign up to see the future, today

Can’t-miss innovations from the bleeding edge of science and tech

We’ve already seen AI go rogue on numerous occasions. Now, new research suggests that we can expect this to become the norm.

The AI research nonprofit Model Evaluation and Threat Research (METR) recently released a study conducted between February and March of this year, aimed at determining just how likely frontier AI models could go rogue. If you’re given to anxiety about the future of AI, the results are unlikely to make you feel better.

“Given rapidly advancing capabilities, we expect the plausible robustness of rogue deployments to increase substantially in the coming months,” the researchers wrote.

The research examined LLMs developed by OpenAI, Google, Anthropic, and Meta for the purpose of the study. They found that frontier AI systems are showing signs of disturbingly deceptive behavior as they become more advanced, often turned to verboten shortcuts or otherwise subverting their operators’ instructions — and some were even smart enough to try to cover their tracks.

In one instance, an internal frontier AI model from OpenAI was told to use specific software for an assigned task. Not only did the agent ignore the request, but it also injected a code to erase evidence of how it arrived at its conclusion — which did not involve use of that software.

In another test, an AI agent from Anthropic was caught “reward hacking.” This is when AI identifies loopholes that help it complete its assignment in a literal sense, even if it doesn’t produce the desired outcome. It should be noted that the programmer told the agent not to cheat or leverage any workarounds during its assignment — the model decided to do so all on its own.

The METR researchers behind the study do not believe there is reason for alarm just yet. For example, they don’t think any of these models is capable of hiding evidence of going rogue on a larger scale. However, they did issue a warning: without stronger security and monitoring, there is a stark risk of this becoming a reality.

“Based on this pilot assessment, we believe that agents as of February and March 2026 would not have had sufficient capability to hide a rogue deployment of significant scale against an active investigation by the company, or to make such a deployment robust to a high-priority effort by the company to shut it down,” the team wrote. “However, this risk could increase rapidly, and we see several reasons to expect the plausible robustness of rogue deployments to increase in the near future, absent stronger alignment, security, and monitoring.”

More on AI going rogue: Scientists Train AI to Be Evil, Find They Can’t Reverse It