





















Abstract:Generating accurate and executable code using Large Language Models (LLMs) remains a significant challenge for underrepresented programming languages, such as Prolog and Lisp, due to the scarcity of public training data compared to high-resource languages like Python. This paper introduces a generalizable Reinforcement Learning (RL) approach that combines small-scale versions of the Qwen2.5-Coder model with Group Relative Policy Optimization (GRPO) to enable effective code generation through reasoning. To address the limitations of sparse datasets, we integrate execution-driven feedback directly into the RL loop, utilizing a reward system that exploits both logical correctness and structural formatting. Experimental results on GSM8K dataset demonstrate significant improvements in reasoning quality and code accuracy across underrepresented languages. These findings underscore the potential of our approach to benefit a wide range of programming languages lacking extensive training resources by leveraging symbolic reasoning and interpreter-based feedback.
| Comments: | Accepted ICLP 2026 |
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Programming Languages (cs.PL) |
| Cite as: | arXiv:2506.11027 [cs.LG] |
| (or arXiv:2506.11027v3 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2506.11027 arXiv-issued DOI via DataCite |
From: Bianca Raimondi [view email]
[v1]
Tue, 20 May 2025 11:28:48 UTC (327 KB)
[v2]
Mon, 16 Jun 2025 09:41:16 UTC (327 KB)
[v3]
Mon, 25 May 2026 11:34:07 UTC (342 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。