惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

C
Check Point Blog
月光博客
月光博客
V
Visual Studio Blog
J
Java Code Geeks
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Project Zero
Project Zero
K
Kaspersky official blog
Cisco Talos Blog
Cisco Talos Blog
人人都是产品经理
人人都是产品经理
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
腾讯CDC
S
Schneier on Security
T
Tor Project blog
C
Cisco Blogs
F
Full Disclosure
云风的 BLOG
云风的 BLOG
P
Palo Alto Networks Blog
博客园 - 司徒正美
罗磊的独立博客
Y
Y Combinator Blog
P
Proofpoint News Feed
IT之家
IT之家
T
The Exploit Database - CXSecurity.com
G
GRAHAM CLULEY
阮一峰的网络日志
阮一峰的网络日志
T
Threat Research - Cisco Blogs
MyScale Blog
MyScale Blog
Engineering at Meta
Engineering at Meta
B
Blog
I
InfoQ
C
Cybersecurity and Infrastructure Security Agency CISA
酷 壳 – CoolShell
酷 壳 – CoolShell
量子位
V
V2EX
博客园 - 【当耐特】
L
LINUX DO - 热门话题
V
V2EX - 技术
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
P
Proofpoint News Feed
SecWiki News
SecWiki News
Microsoft Security Blog
Microsoft Security Blog
Hacker News: Ask HN
Hacker News: Ask HN
美团技术团队
D
Darknet – Hacking Tools, Hacker News & Cyber Security
大猫的无限游戏
大猫的无限游戏
Vercel News
Vercel News
Last Week in AI
Last Week in AI
E
Exploit-DB.com RSS Feed
S
Security Affairs
GbyAI
GbyAI

Simon Willison's Weblog

Thoughts on GitLab’s workforce reduction A quote from James Shore Your AI Use Is Breaking My Brain TIL: Using LLM in the shebang line of a script Learning on the Shop floor A quote from New York Times Editors’ Note A quote from Andrew Quinn A quote from Luke Curley Release: llm-gemini 0.31 Tool: Big Words Behind the Scenes Hardening Firefox with Claude Mythos Preview Notes on the xAI/Anthropic data center deal Tool: GitHub Repo Stats Live blog: Code w/ Claude 2026 Vibe coding and agentic engineering are getting closer than I’d like Release: datasette-referrer-policy 0.1 Release: datasette-llm 0.1a7 Release: llm-echo 0.5a0 Granite 4.1 3B SVG Pelican Gallery A quote from Andy Masley April 2026 newsletter Research: TRE Python binding — ReDoS robustness demo Tool: Redis Array Playground A quote from Anthropic Sightings iNaturalist Sightings Codex CLI 0.128.0 adds /goal Our evaluation of OpenAI's GPT-5.5 cyber capabilities Quoting Andrew Kelley We need RSS for sharing abundant vibe-coded apps Release: llm 0.32a1 LLM 0.32a0 is a major backwards-compatible refactor Release: llm 0.32a0 Quoting OpenAI Codex base_instructions Quoting Matthew Yglesias What's new in pip 26.1 - lockfiles and dependency cooldowns! Introducing talkie: a 13B vintage language model from 1930 microsoft/VibeVoice Tracking the history of the now-deceased OpenAI Microsoft AGI clause WHY ARE YOU LIKE THIS Quoting Romain Huet GPT-5.5 prompting guide llm 0.31 DeepSeek V4 - almost on the frontier, a fraction of the price Tool: Millisecond Converter It's a big one russellromney/honker Serving the For You feed Extract PDF text in your browser with LiteParse for the web A pelican for GPT-5.5 via the semi-official Codex backdoor API Release: llm-openai-via-codex 0.1a0 Quoting Maggie Appleton A quote from Bobby Holley Is Claude Code going to cost $100/month? Probably not—it’s all very confusing Where’s the raccoon with the ham radio? (ChatGPT Images 2.0) A quote from Andreas Påhlsson-Notini scosman/pelicans_riding_bicycles Release: llm-openrouter 0.6 TIL: SQL functions in Google Sheets to fetch data from Datasette Claude Token Counter, now with model comparisons Headless everything for personal AI Research: Claude system prompts as a git timeline Adding a new content type to my blog-to-newsletter tool - Agentic Engineering Patterns Join us at PyCon US 2026 in Long Beach—we have new AI and security tracks this year Release: datasette 1.0a28 Release: llm-anthropic 0.25 Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7 Tool: datasette.io news preview Release: datasette-export-database 0.3a1 Release: datasette 1.0a27 Gemini 3.1 Flash TTS Tool: Gemini 3.1 Flash TTS A quote from Kyle Kingsbury Release: datasette-ports 0.3 Zig 0.16.0 release notes: “Juicy Main” datasette PR #2689: Replace token-based CSRF with Sec-Fetch-Site header protection Tool: SQLite Query Result Formatter Demo Tool: SQLite Query Result Formatter Demo A quote from Giles Turnbull A quote from Giles Turnbull Research: SQLite WAL Mode Across Docker Containers Sharing a Volume Research: SQLite WAL Mode Across Docker Containers Sharing a Volume Tool: Cleanup Claude Code Paste Release: datasette-ports 0.1 Eight years of wanting, three months of building with AI A quote from Chengpeng Mou Tool: Syntaqlite Playground Release: scan-for-secrets 0.2 Release: scan-for-secrets 0.1.1 Release: scan-for-secrets 0.1 Release: research-llm-apis 2026-04-04 A quote from Kyle Daigle Vulnerability Research Is Cooked The cognitive impact of coding agents A quote from Willy Tarreau A quote from Daniel Stenberg A quote from Greg Kroah-Hartman Research: Can JavaScript Escape a CSP Meta Tag Inside an Iframe? The Axios supply chain attack used individually targeted social engineering Highlights from my conversation about agentic engineering on Lenny’s Podcast
“They screwed us”: Personality clashes sent Anthropic’s models offline
Simon Willison · 2026-06-15 · via Simon Willison's Weblog

15th June 2026 - Link Blog

"They screwed us": Personality clashes sent Anthropic's models offline. Lots of "source familiar with the administration's thinking" and "source close to Anthropic" in this Axios piece, which is the best collection of behind-the-scenes gossip I've seen about the US government export control Mythos/Fable story so far.

Logan Graham (I lead the Frontier Red Team at Anthropic), Dave Orr (Head of Safeguards, previously a Director of Engineering at Google DeepMind), and blog favorite Nicholas Carlini are reported to be meeting with the Commerce Department today in D.C. Good luck to them!

(I just noticed Logan was "Special Adviser to the Prime Minister" in the Boris Johnson era, covering AI, science, and technology policy - so significant political experience.)

This closing notes doesn't give me much optimism that we'll be getting Fable back any time soon:

The bottom line: One option is to make sure Anthropic's models can't be jailbroken — though perfect jailbreak resistance may be impossible.

Absent that, a source familiar with the administration's thinking said it may simply come down to an attitude fix where, instead of feeling dismissed, "everyone feels safe, secure and happy."

This made me wonder if Anthropic ever successfully addressed the class of attacks described in the Universal and Transferable Adversarial Attacks on Aligned Language Models paper from 2023.

It looks like their Constitutional Classifiers work (that post is from January this year) is relevant to that. They continue to claim that no "universal jailbreak" has been found against Claude Mythos, classifying the jailbreak that triggered the US government response as "a potential narrow, non-universal jailbreak".