惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

K
Kaspersky official blog
Martin Fowler
Martin Fowler
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
V
Visual Studio Blog
博客园_首页
Engineering at Meta
Engineering at Meta
The Cloudflare Blog
MongoDB | Blog
MongoDB | Blog
Blog — PlanetScale
Blog — PlanetScale
T
The Blog of Author Tim Ferriss
雷峰网
雷峰网
D
Docker
博客园 - 司徒正美
S
SegmentFault 最新的问题
M
MIT News - Artificial intelligence
博客园 - 叶小钗
博客园 - 三生石上(FineUI控件)
U
Unit 42
J
Java Code Geeks
A
About on SuperTechFans
N
Netflix TechBlog - Medium
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
S
Security Affairs
I
Intezer
Cisco Talos Blog
Cisco Talos Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
B
Blog RSS Feed
P
Privacy & Cybersecurity Law Blog
T
Tenable Blog
T
Threatpost
H
Hacker News: Front Page
G
Google Developers Blog
博客园 - 【当耐特】
Hugging Face - Blog
Hugging Face - Blog
Apple Machine Learning Research
Apple Machine Learning Research
L
Lohrmann on Cybersecurity
大猫的无限游戏
大猫的无限游戏
Google DeepMind News
Google DeepMind News
A
Arctic Wolf
S
Secure Thoughts
GbyAI
GbyAI
NISL@THU
NISL@THU
S
Security @ Cisco Blogs
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
Webroot Blog
Webroot Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
O
OpenAI News
Spread Privacy
Spread Privacy
Application and Cybersecurity Blog
Application and Cybersecurity Blog

Gareth Dwyer — Writing

Slop is text you haven't read, not text you haven't written Being 'good at business' can be bad for business What is an AI Product Engineer? The worst bug I've seen so far in Claude Code Claude Code Is All You Need AI Reaction Archetypes
Claude mixes up who said what, and that's not OK
2026-04-09 · via Gareth Dwyer — Writing

The bug

Claude sometimes sends messages to itself and then thinks those messages came from the user. This is the worst bug I’ve seen from an LLM provider, but people always misunderstand what’s happening and blame LLMs, hallucinations, or lack of permission boundaries. Those are related issues, but this ‘who said what’ bug is categorically distinct.

I wrote about this in detail in The worst bug I’ve seen so far in Claude Code, where I showed two examples of Claude giving itself instructions and then believing those instructions came from me.

Screenshot from my previous article showing Claude attributing its own message to the user

Claude told itself my typos were intentional and deployed anyway, then insisted I was the one who said it.

It’s not just me

Here’s a Reddit thread where Claude said “Tear down the H100 too”, and then claimed that the user had given that instruction.

Screenshot from Reddit showing Claude claiming the user told it to tear down an H100

From r/Anthropic — Claude gives itself a destructive instruction and blames the user.

“You shouldn’t give it that much access”

Comments on my previous post were things like “It should help you use more discipline in your DevOps.” And on the Reddit thread, many in the class of “don’t give it nearly this much access to a production environment, especially if there’s data you want to keep.”

This isn’t the point. Yes, of course AI has risks and can behave unpredictably, but after using it for months you get a ‘feel’ for what kind of mistakes it makes, when to watch it more closely, when to give it more permissions or a longer leash.

This class of bug seems to be in the harness, not in the model itself. It’s somehow labelling internal reasoning messages as coming from the user, which is why the model is so confident that “No, you said that.”

Before, I thought it was a temporary thing — I saw it a few times in a single day, and then not again for months. But either they have a regression or it was a coincidence and it just pops up every so often, and people only notice when it gives itself permission to do something bad.

Update

This article reached #1 on Hacker News, and it seems that this is definitely a widespread issue. Here’s another super clear example shared by nathell (full transcript).

Screenshot showing another instance of Claude mixing up who said what

From nathell — Claude asks itself “Shall I commit this progress?” and treats it as user approval.

Several people questioned whether this is actually a harness bug like I assumed, as people have reported similar issues using other interfaces and models, including chatgpt.com. One pattern does seem to be that it happens in the so-called “Dumb Zone” once a conversation starts approaching the limits of the context window.