





















Abstract:Linear attention mitigates the quadratic complexity of softmax attention but suffers from a critical loss of expressiveness. We identify two primary causes: (1) The normalization operation cancels the query norm, which breaks the correlation between a query's norm and the spikiness (entropy) of the attention distribution as in softmax attention. (2) Standard techniques for enforcing non-negativity cause destructive information loss by nullifying valid inner-product interactions. To address these challenges, we introduce NaLaFormer, a novel linear attention mechanism built upon a norm$\times$direction (ND) decomposition of the query and key vectors. We leverage each component to solve a distinct problem: The query norm is injected into our kernel to create a query-norm-aware map that restores the attention distribution's spikiness. The direction vectors are processed by a geometric, cosine-based similarity metric that guarantees non-negativity while preserving the rich, fine-grained information of the inner product. We validate NaLaFormer through a comprehensive multi-modal evaluation, where it sets new state-of-the-art benchmarks for linear attention. Our model achieves up to a 7.5% accuracy gain on ImageNet-1K and a 4.7% mIoU improvement on ADE20K over comparable baselines. It demonstrates profound efficiency, reducing peak memory by a transformative 92.3% in token-intensive super-resolution tasks (70K+ tokens). NaLaFormer's versatility is further confirmed as it surpasses strong baselines like Mamba on common-sense reasoning and sets a new state-of-the-art on the Long Range Arena (LRA) benchmark. Code is available at this https URL .
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2506.21137 [cs.LG] |
| (or arXiv:2506.21137v3 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2506.21137 arXiv-issued DOI via DataCite |
From: Liangyu Huo [view email]
[v1]
Thu, 26 Jun 2025 10:47:39 UTC (2,192 KB)
[v2]
Wed, 4 Feb 2026 09:07:29 UTC (8,002 KB)
[v3]
Mon, 25 May 2026 09:25:19 UTC (7,652 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。