





















Abstract:Adapting large AI models (LAMs) to personalized edge data is challenging because wireless devices have limited memory, computation, and uplink capacity. Federated fine-tuning preserves data privacy but still requires each device to host the full model, while split learning reduces device memory at the cost of heavy activation transmission. This paper proposes TSFLora, a token-compressed split fine-tuning framework for communication-efficient LAM adaptation at the edge. TSFLora combines attention-guided token selection, token merging, low-bit activation quantization, and LoRA-based adaptation within a split federated training pipeline. The key idea is to compress the intermediate token sequence before transmission so that the system reduces both uplink traffic and server-side processing without changing the frozen backbone. Experiments on ViT models over CIFAR-10, CIFAR-100, and TinyImageNet show that TSFLora achieves up to \textbf{6.8$\times$} communication reduction and \textbf{41\%} memory saving while maintaining competitive accuracy.
| Subjects: | Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.23988 [cs.DC] |
| (or arXiv:2605.23988v1 [cs.DC] for this version) | |
| https://doi.org/10.48550/arXiv.2605.23988 arXiv-issued DOI via DataCite |
From: Xianke Qiang [view email]
[v1]
Sun, 17 May 2026 08:50:01 UTC (3,180 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。