























Abstract:In speech processing, most state-of-the-art sequence prediction models rely on auto-regressive (AR) strategies to generate output sequences based on the raw predictions of the model. Despite their crucial role in the inference process, a comprehensive overview of AR strategies as a unified field is lacking, due largely to implicit and multiple definitions of next-token decoding. This context complicates the choice, comparison, and evaluation of strategies, while creating inconsistencies in the characterization of approaches as auto-regressive or not. We begin by setting explicit inclusion criteria for the field of AR search in speech processing, and derive a generalized theoretical framework to categorize and report on search strategies for neural models. We show the capabilities of this formalism in simplifying the design of benchmarks centered around the decoding process, allowing for ablation studies that are focused on search strategies.
From: Julia Gachot [view email]
[v1]
Tue, 16 Jun 2026 13:31:39 UTC (81 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。