惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

大猫的无限游戏
大猫的无限游戏
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
AWS News Blog
AWS News Blog
V
V2EX - 技术
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Cloudbric
Cloudbric
S
Securelist
L
LINUX DO - 最新话题
Scott Helme
Scott Helme
T
Threat Research - Cisco Blogs
S
Schneier on Security
Simon Willison's Weblog
Simon Willison's Weblog
G
GRAHAM CLULEY
I
Intezer
C
Cybersecurity and Infrastructure Security Agency CISA
C
CERT Recently Published Vulnerability Notes
SecWiki News
SecWiki News
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
TaoSecurity Blog
TaoSecurity Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Attack and Defense Labs
Attack and Defense Labs
S
Security Affairs
D
Docker
The Cloudflare Blog
博客园 - 三生石上(FineUI控件)
爱范儿
爱范儿
美团技术团队
W
WeLiveSecurity
阮一峰的网络日志
阮一峰的网络日志
月光博客
月光博客
Recent Commits to openclaw:main
Recent Commits to openclaw:main
博客园_首页
G
Google Developers Blog
C
Cisco Blogs
T
Tor Project blog
B
Blog RSS Feed
Vercel News
Vercel News
宝玉的分享
宝玉的分享
Recorded Future
Recorded Future
Cisco Talos Blog
Cisco Talos Blog
P
Palo Alto Networks Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
E
Exploit-DB.com RSS Feed
PCI Perspectives
PCI Perspectives
K
Kaspersky official blog
量子位
Google Online Security Blog
Google Online Security Blog
Jina AI
Jina AI
Hacker News - Newest:
Hacker News - Newest: "LLM"
aimingoo的专栏
aimingoo的专栏

cs.LG updates on arXiv.org

暂无文章

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning
Zedian Shao, Charles Fleming, Teodora Baluta · 2026-05-26 · via cs.LG updates on arXiv.org

Large language models (LLMs) are often fine-tuned on uncurated text datasets that adversaries can poison. Existing poisoning attacks primarily rely on fixed trigger phrases that defenses such as outlier detection, clean-data regularization, or online monitoring can neutralize. In this paper, we propose a data poisoning method that teaches an LLM an information hiding scheme reliably and stealthily through semantic associations between shared knowledge such as facts or concepts and attacker-chosen phrases. The induced hiding scheme can encode and decode arbitrary malicious instructions, thus revealing a new and subtle poisoning-induced vulnerability: covert control attacks. We precisely characterize covert control attacks and evaluate them across $5$ LLMs, $3$ backdoor defenses, and $4$ prompt injection defenses. With a small poisoned fraction, covert control attacks outperform heuristic-based prompt injection attacks in average attack success rate by about $40\%$ relative to clean fine-tuned models. They also circumvent defenses based on detection and fine-tuning, maintaining up to $93\%$ attack success rate after backdoor defenses and up to $98\%$ after prompt injection defenses.