





















Abstract:Machine unlearning removes designated concepts or knowledge from pre-trained models. Recent work has extended this paradigm to speaker identity unlearning in zero-shot text-to-speech (ZS-TTS), the task of selectively erasing a model's ability to replicate a speaker's voice. Existing methods, however, quietly assume all unlearning requests arrive at once; an unrealistic assumption, since privacy-motivated removals arrive sequentially over time. We show this assumption breaks state-of-the-art methods: unlearning each new speaker fully revives previously unlearned speakers, reintroducing the very privacy risk unlearning was meant to eliminate. We present Cumulative ORThogonal Identity Suppression (CORTIS), the first framework for continual speaker identity unlearning in ZS-TTS that requires no access to previously-unlearned speaker data. CORTIS combines Fisher-information-based parameter masking, which localizes updates to speaker-relevant weights, with orthogonal projection against subspaces spanned by prior unlearning updates. With VoiceBox, CORTIS unlearns each requested speaker while keeping previously unlearned speakers forgotten across long request sequences, substantially outperforming sequential application of prior methods. The demo is available at this https URL .
| Comments: | preprint |
| Subjects: | Sound (cs.SD); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.25962 [cs.SD] |
| (or arXiv:2605.25962v1 [cs.SD] for this version) | |
| https://doi.org/10.48550/arXiv.2605.25962 arXiv-issued DOI via DataCite (pending registration) |
From: Jinju Kim [view email]
[v1]
Mon, 25 May 2026 15:40:04 UTC (631 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。