

















Abstract:The recently proposed Large Concept Model (LCM) generates text by predicting a sequence of sentence-level embeddings and training with either mean-squared error or diffusion objectives. We present SONAR-LLM, a decoder-only transformer that "thinks" in the same continuous SONAR embedding space, yet is supervised through token-level cross-entropy propagated via the frozen SONAR decoder. This hybrid objective retains the semantic abstraction of LCM while eliminating its diffusion sampler and restoring a likelihood-based training signal. Across model sizes from 39M to 1.3B parameters, SONAR-LLM attains competitive generation quality. We report scaling trends, ablations, benchmark results, and release the complete training code and all pretrained checkpoints to foster reproducibility and future research.
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2508.05305 [cs.CL] |
| (or arXiv:2508.05305v2 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2508.05305 arXiv-issued DOI via DataCite |
From: Nikita Dragunov [view email]
[v1]
Thu, 7 Aug 2025 12:03:44 UTC (119 KB)
[v2]
Mon, 25 May 2026 21:53:05 UTC (175 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。