





















Abstract:Competitive music transcription models require large amounts of paired audio-score data, which is scarce due to collection costs, alignment difficulty, and copyright restrictions. Meanwhile, vast quantities of unpaired audio recordings and symbolic scores are freely available but have gone unused. We adopt a cycle-consistent translation framework in which a small amount of paired data acts as a minimal anchor, unlocking the full potential of the unpaired pool. We find that: unpaired data yields surprisingly large gains, especially under limited supervision; unpaired audio contributes more than unpaired scores; incorporating unlabeled audio from a new instrument during training improves transcription for that instrument without any paired supervision. Together, these results suggest that scaling unpaired data offers a practical path toward high-quality transcription for instruments where labeled data remains scarce.
| Subjects: | Sound (cs.SD); Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.24193 [cs.SD] |
| (or arXiv:2605.24193v1 [cs.SD] for this version) | |
| https://doi.org/10.48550/arXiv.2605.24193 arXiv-issued DOI via DataCite (pending registration) |
From: Saebyeol Shin [view email]
[v1]
Fri, 22 May 2026 20:32:57 UTC (1,347 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。