

















Abstract:Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, as a measure of model uncertainty, is highly correlated with VLM reliability. While prior entropy-based attacks maximize uncertainty at all decoding steps, implicitly assuming that every token equally contributes to model instability, we reveal that a small fraction (around 20%) of high-entropy tokens, in the evaluated representative open-source VLMs with diverse architectures, concentrates a disproportionate share of adversarial influence during autoregressive generation. We demonstrate that concentrating adversarial perturbations on these high-entropy positions achieves comparable semantic degradation to global methods while optimizing fewer decoding positions. Additionally, across multiple representative VLMs, such attacks induce not only semantic drift but also a substantial unsafe subset (20-31%) under the current pipeline. Remarkably, since such vulnerable high-entropy tokens recur across architecturally diverse VLMs, attacks focused on them exhibit non-trivial transferability. Motivated by these findings, we design a simple Entropy-Guided Attack (EGA) that operationalizes sparse high-entropy targeting and extends it with a reusable token bank, yielding competitive attack success rates (93-95%) with a considerable harmful rate (30.2-38.6%) on the three representative open-source VLMs.
| Comments: | 19 Pages,11 figures,8 tables |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) |
| ACM classes: | I.2.0; I.4.0 |
| Cite as: | arXiv:2512.21815 [cs.CV] |
| (or arXiv:2512.21815v3 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2512.21815 arXiv-issued DOI via DataCite |
From: Mengqi He [view email]
[v1]
Fri, 26 Dec 2025 01:01:25 UTC (1,956 KB)
[v2]
Mon, 11 May 2026 15:42:09 UTC (1,673 KB)
[v3]
Sat, 23 May 2026 07:39:17 UTC (1,673 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。