























Abstract:Phonemization is a critical component in text-to-speech synthesis. Traditional approaches rely on deterministic transformations and lexica, while neural methods offer potential for higher generalization on out-of-vocabulary (OOV) terms. This work introduces OLaPh (Optimal Language Phonemizer), a hybrid framework that integrates extensive multilingual lexica with advanced NLP techniques and a statistical subword segmentation function. Evaluations on the WikiPron benchmark show that the OLaPh framework significantly outperforms established baselines in overall accuracy and maintains robustness on OOV data through advanced fallback mechanisms. To further explore neural generalization, we utilize the framework to synthesize a high-consistency training corpus for an instruction-tuned Large Language Model (LLM). While the deterministic framework remains more accurate overall, the LLM demonstrates strong generalization, matching or partly exceeding the framework's performance. This suggests that the LLM successfully internalized phonetic intuitions from the synthetic data that transcend the framework's capabilities. Together, these tools provide a comprehensive, open-source resource for multilingual G2P research.
| Comments: | 11 pages, 1 figure, 4 tables |
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2509.20086 [cs.CL] |
| (or arXiv:2509.20086v2 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2509.20086 arXiv-issued DOI via DataCite |
From: Johannes Wirth [view email]
[v1]
Wed, 24 Sep 2025 13:05:09 UTC (54 KB)
[v2]
Sat, 25 Apr 2026 08:45:16 UTC (78 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。