




















Abstract:Quaternion neural networks are parameter-efficient and model multidimensional dependencies by representing four related features as a single entity. However, existing quaternion self-attention computes component-wise scores and applies independent softmax operations to each component, which increases the computational cost and allows attention distributions to diverge across components. We propose a shared-score quaternion self-attention mechanism that computes a single real-valued score using the quaternion inner product and applies a shared attention distribution across all components. This reduces score-computation multiplications by 75% and the number of softmax operations from four to one. We prove that, when queries and keys are produced by quaternion linear projections that induce component pre-mixing, the component-wise and shared scores lie in the same interaction subspace, indicating that independent component-wise attention primarily re-parameterizes the same interactions rather than expanding the feature interaction space. In speech enhancement, our method reduces inference time by up to 44.3% on a GPU and 58.1% on a CPU while maintaining quality, with consistent trends across vision and natural language processing.
| Comments: | 26 pages, 6 figures and 15 tables. Accepted at ICML2026 |
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML) |
| Cite as: | arXiv:2605.24920 [cs.LG] |
| (or arXiv:2605.24920v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.24920 arXiv-issued DOI via DataCite (pending registration) |
From: Tohru Nitta [view email]
[v1]
Sun, 24 May 2026 07:52:19 UTC (2,982 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。