惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

阮一峰的网络日志
阮一峰的网络日志
D
Darknet – Hacking Tools, Hacker News & Cyber Security
S
Schneier on Security
The Last Watchdog
The Last Watchdog
Cyberwarzone
Cyberwarzone
S
Securelist
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cyber Attacks, Cyber Crime and Cyber Security
L
Lohrmann on Cybersecurity
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - 司徒正美
The Cloudflare Blog
V
V2EX
博客园_首页
博客园 - 聂微东
Vercel News
Vercel News
人人都是产品经理
人人都是产品经理
G
GRAHAM CLULEY
T
Tenable Blog
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
L
LINUX DO - 最新话题
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
SecWiki News
SecWiki News
博客园 - 三生石上(FineUI控件)
S
Secure Thoughts
N
News | PayPal Newsroom
T
The Blog of Author Tim Ferriss
The GitHub Blog
The GitHub Blog
T
Troy Hunt's Blog
博客园 - 【当耐特】
Forbes - Security
Forbes - Security
H
Hacker News: Front Page
A
About on SuperTechFans
B
Blog RSS Feed
Engineering at Meta
Engineering at Meta
MongoDB | Blog
MongoDB | Blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
罗磊的独立博客
D
DataBreaches.Net
P
Privacy & Cybersecurity Law Blog
Schneier on Security
Schneier on Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Google DeepMind News
Google DeepMind News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Jina AI
Jina AI
D
Docker
P
Proofpoint News Feed

Florian Brand

The vibes in China’s AI labs Quo vadis, LLM benchmarks? Local models are (not) cope You Living in the Agentic Era Using OpenAI A Guide to LLMs for Programmers Sane Python dependency management with uv How FastHTML sparked my joy in web development
The Myth of unsafe Open Source AI
Florian Brand · 2026-06-10 · via Florian Brand

Researchers, often working at closed labs, argue that open models are inherently unsafe because you cannot control them after their release, and because they can be fine-tuned for any malicious use case. While this argument is theoretically true, it also assumes that closed models are safer, that their guardrails work, and that their providers take measures against misuse. Therefore, I went ahead and researched the misuse of both open and closed models and compiled a list.

For this, I looked into third-party reports examining real-world impacts. While benchmarks tell one story, real-world threat actors will try to break guardrails and circumvent safety measures. Furthermore, many cybersecurity benchmarks are outdated or focus on finding novel exploits, while the real world runs on outdated software.

The obvious caveat to this whole blog post is that it is hard to know exactly when and how open models are used. That is why I focused on third-party providers that do not have privileged access to the usage data of closed models, or inference APIs in general, but nevertheless identified the use of AI during their investigations.

I conducted the research using ChatGPT and Gemini, testing various models at their highest settings with different prompts and modes to increase coverage. Just like all my other public work, I wrote the report myself.

Cybersecurity #

Mexican Government Hack (Link). A single threat actor used Claude Code and GPT-4.1 together to exfiltrate 195 million taxpayer records. The campaign lasted from December 2025 to February 2026, and Claude’s guardrails were circumvented using an AGENTS.md file. Claude, together with GPT-4.1 over the API, did most of the work needed to chain together different exploits, with the threat actor nudging the models from time to time. Essentially, someone was able to vibe-hack the Mexican government using Claude and GPT. Notably, this campaign happened during and after the introduction of Constitutional Classifiers++.

Bissa Scanner (Link). A scanner exploiting known vulnerabilities, such as React2Shell, was operated using Claude Code and OpenClaw, with Claude Sonnet 4.6 as the model. It stole thousands of records, API keys, and files.

FortiGate (Link, Link). An individual exploited hundreds of poorly configured FortiGate gateways using Claude and DeepSeek, together with a collection of scripts, to exfiltrate data.

PromptSpy (Link). A novel piece of malware uses Google Gemini as a computer-use agent on Android phones to lock users into a malicious app. The infected app had to be sideloaded and, at the time of writing, had no known real-world impact, but it demonstrates the potential future use of capable CUA models.

EvilTokens (Link). In a phishing campaign, the threat actors used GPT-4o-mini to translate emails, Llama 3.1 8B to analyze them, and Llama 3.3 70B for further assessment, including identifying individuals susceptible to social engineering. Llama’s guardrails were circumvented with a prompt.

LameHug (Link). The Russian state-sponsored group APT28 used Qwen2.5-Coder-32B-Instruct through the Hugging Face inference API to create malicious commands at runtime on victims’ systems. The malware targeted entities in the security and defense sectors using a compromised Ukrainian ministry account. It was discovered by Ukraine’s CERT in mid-2025.

Patriot Bait (Link). A Russian-speaking actor used Google Gemini through the Gemini CLI to roleplay as an American veteran and patriot. The model’s guardrails were circumvented with a prompt in a GEMINI.md file. The actor used Gemini to create misinformation, run pump-and-dump schemes, hack WordPress sites, and steal credentials.

North Korean spearphishing campaign (Link). North Korean actors used ChatGPT’s image generation to create sample ID cards of South Koreans as part of a spearphishing campaign.

Misinformation and Deepfakes #

Pravda Propaganda Network (Link, Link). While I exclude superficial benchmarks from this blog post, I want to make an exception for this particular campaign. The Russian Pravda Network spreads pro-Kremlin and pro-Iranian misinformation, which is then picked up by various chatbots and used as a source in their responses. This affects all major chat apps, including ChatGPT, Claude, Gemini, Grok, and DeepSeek. It also appears to be becoming a larger problem as the network achieves growing success.

CopyCop (Link). The Russian propaganda network Storm-1516 is creating hundreds of fake websites targeting Western nations, media organizations, and political parties. It likely uses, or used, an uncensored version of Llama 3.1 8B for its purposes.

Grok image generation (Link). Grok’s image generation was used to create unsolicited sexual images, including images involving minors, as well as Nazi and ISIS propaganda. Apparently, it remained a problem even after its guardrails were tightened, although I have been unable to find newer reports.

CSAM and Porn (Link, Link). The dominant area of misuse for open models appears to be image and video generation, where fine-tuned models and LoRA adapters are the norm. Because this content is highly illegal, offenders advise against using closed models that log requests. The IWF report is pretty damning in this regard, while also acknowledging that the problem is far greater than what it can observe.

Conclusion #

Closed models appear to be predominantly used for malicious purposes, with the exception of deepfakes and CSAM. Their guardrails are easily bypassed with a single prompt, affecting every major model lab and even frontier models today.

Looking at the reports, closed models are used either because they are at the current frontier or because they are easier to set up and use. It also makes intuitive sense: Why secretly acquire hundreds of GPUs, collect a ton of data, and hire researchers with the expertise needed to fine-tune a less capable model for a single use case when you can just add a CLAUDE.md file to a frontier model and use it to break into governments?

While closed-model companies could restrict or even revoke access to their models, we have yet to see a model retracted. Instead, the trend is toward making increasingly capable models available to more companies and people as quickly as possible. Also, once misuse has happened, it is impossible to undo. Providers can only tighten safety classifiers or silently nerf the models afterwards.

I expect people to argue that this analysis looks backwards rather than forwards. This is a convenient argument because it is impossible to refute with evidence, and it is brought up every time there is a step change in capabilities, whether with ChatGPT, GPT-4, o1, Sonnet 3.6, or Opus 4.5. After some time, open models catch up at a fraction of the price, diffusing those capabilities even further. And yet the misuse frontier remains dominated by closed models. I see no indication that Mythos or Fable will be any different. The historical evidence, as outlined in this blog post, is pretty clear in this regard.