

















Abstract:Vision-Language-Action (VLA) models have shown great potential for embodied AI by integrating visual perception, language understanding, and action execution. In real-time deployment, these models must process continuous visual streams, incurring substantial computational overhead. Visual token pruning -- a mainstream technique for accelerating Vision-Language Models (VLMs) by retaining salient tokens while discarding redundant ones -- offers a natural candidate solution to this challenge. However, directly applying VLM-oriented pruning methods to VLA inference can cause severe degradation in manipulation performance. Our analysis attributes this degradation to a key mismatch: VLA inference exhibits distinct attention patterns between the vision-language prefill stage and the action-decode stage, so pruning based only on context-prefill semantic salience is biased toward semantic cues and may remove action-critical visual tokens. Motivated by this observation, we propose VLA-Pruner, an effective plug-and-play token pruning method grounded in the visual requirements of VLA inference, further exploiting the temporal continuity of robot manipulation. Specifically, VLA-Pruner estimates visual-token importance from both semantic prefilling and temporally smoothed action relevance, and then applies a Combine-then-Filter strategy to retain compact, non-redundant tokens under the compute budget. Experiments show that VLA-Pruner outperforms state-of-the-art approaches across multiple VLA architectures, achieving up to 1.99x speedup with comparable manipulation quality.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2511.16449 [cs.CV] |
| (or arXiv:2511.16449v5 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2511.16449 arXiv-issued DOI via DataCite |
From: Ziyan Liu [view email]
[v1]
Thu, 20 Nov 2025 15:16:09 UTC (1,779 KB)
[v2]
Fri, 21 Nov 2025 11:57:47 UTC (1,779 KB)
[v3]
Tue, 10 Feb 2026 05:44:18 UTC (1,786 KB)
[v4]
Mon, 25 May 2026 17:05:43 UTC (1,763 KB)
[v5]
Tue, 26 May 2026 14:15:17 UTC (1,758 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。