

















Abstract:Few-shot adaptation of vision-language models remains fundamentally limited by how negative class signals are handled at inference. Existing methods apply uniform negative suppression across all queries, ignoring that the most damaging confusions are query-specific and shift with support-set geometry. We introduce SCAN (Selective Confusion-Aware Negatives), a framework that addresses this gap through three targeted contributions. In inference, query-adaptive negative routing restricts suppression to the top-K most confusable classes per query, requiring zero additional parameters. Generic negative text templates are replaced with LLM-bootstrapped contrastive prompts that describe discriminative attributes between confusable class pairs, sharpening the textual decision boundary where it matters most. A parameter-free adaptive fusion weight estimated from support-set Fisher discriminability removes the need for manual tuning of the vision-language trade-off. Evaluated across 11 standard benchmarks, SCAN consistently outperforms prior prompt-based and adapter-based methods by an average of 4.61% at 16-shot, with gains of up to 7.70% on fine-grained datasets where inter-class confusion is most severe. SCAN also generalizes strongly under distribution shift, improving by 2.95% on average across four ImageNet OOD variants, and maintains robust performance under significant label noise, with accuracy under 50% label corruption still exceeding the clean baseline of the strongest competing method.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Robotics (cs.RO) |
| Cite as: | arXiv:2505.11758 [cs.CV] |
| (or arXiv:2505.11758v2 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2505.11758 arXiv-issued DOI via DataCite |
From: Sriram Mandalika [view email]
[v1]
Fri, 16 May 2025 23:39:34 UTC (1,984 KB)
[v2]
Mon, 25 May 2026 11:20:14 UTC (1,191 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。