





















Abstract:Autoregressive models (ARMs) currently constitute the dominant paradigm for large language models (LLMs). Energy-based models (EBMs) represent another class of models, which have historically been less prevalent in LLM development, yet naturally characterize the optimal policy in post-training alignment. In this paper, we provide a unified view of these two model classes. Taking the chain rule of probability as a starting point, we establish an explicit bijection between ARMs and EBMs in function space, which we show to correspond to a special case of the soft Bellman equation in maximum entropy reinforcement learning. Building upon this bijection, we derive the equivalence between supervised learning of ARMs and EBMs. Furthermore, we analyze the distillation of EBMs into ARMs by providing theoretical error bounds. Our results provide insights into the ability of ARMs to plan ahead, despite being based on the next-token prediction paradigm.
| Subjects: | Machine Learning (cs.LG); Machine Learning (stat.ML) |
| Cite as: | arXiv:2512.15605 [cs.LG] |
| (or arXiv:2512.15605v4 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2512.15605 arXiv-issued DOI via DataCite |
From: Mathieu Blondel [view email]
[v1]
Wed, 17 Dec 2025 17:14:26 UTC (942 KB)
[v2]
Thu, 29 Jan 2026 13:36:03 UTC (993 KB)
[v3]
Tue, 7 Apr 2026 08:58:47 UTC (997 KB)
[v4]
Mon, 25 May 2026 15:54:35 UTC (1,030 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。