

















Abstract:Neologism-aware machine translation aims to translate source sentences containing neologisms into target languages. This field remains underexplored compared with general machine translation (MT). In this paper, we propose an agentic framework, NeoAMT, for neologism-aware machine translation equipped with a Wiktionary-based search toolkit. Specifically, we first construct a dedicated dataset for neologism-aware machine translation and build a search toolkit grounded in Wiktionary. The dataset covers 16 languages and 75 translation directions in total, derived from approximately 10 million records of an English Wiktionary dump. The retrieval corpus of the search toolkit is also constructed from around 3 million cleaned records of the same dump. We then leverage the dataset and toolkit to train a translation agent via reinforcement learning (RL) and to evaluate the accuracy of neologism-aware machine translation. Furthermore, we propose an RL training framework featuring a novel reward design and an adaptive rollout generation strategy that exploits translation difficulty to further improve the translation quality of translation agents using our search toolkit.
| Comments: | ACL 2026 Main. Fixed minor typos |
| Subjects: | Computation and Language (cs.CL); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2601.03790 [cs.CL] |
| (or arXiv:2601.03790v4 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2601.03790 arXiv-issued DOI via DataCite |
From: Zhongtao Miao [view email]
[v1]
Wed, 7 Jan 2026 10:49:00 UTC (867 KB)
[v2]
Mon, 12 Jan 2026 15:01:07 UTC (871 KB)
[v3]
Mon, 27 Apr 2026 05:04:26 UTC (885 KB)
[v4]
Sun, 24 May 2026 21:48:22 UTC (891 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。