























Authors:Youjun Chen, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Shujie Hu, Huimeng Wang, Haoning Xu, Chengxi Deng, Bowen Zhang, Xunying Liu
Abstract:Explainable and trustworthy speech emotion recognition (SER) remains a challenging task to date, largely due to the scarcity of SER data with reliable speech emotion descriptor (SED) labels, such as prosodic features and speaker traits. This paper presents a confidence score and reinforcement learning (RL) based on-the-fly SED rectification approach for post-training SER systems on automatically annotated SED labels. Experiments on IEMOCAP and MELD suggest that explainable SER systems incorporating the proposed confidence score and RL-based SED rectification approach consistently outperform baselines without data selection or SED rectification. The best performing system, which integrates both components, surpasses the baseline without data selection and SED rectification, achieving SER gains of 2.9% and 3.3% absolute (3.7% and 5.4% relative) on IEMOCAP and MELD benchmarks, respectively.
From: Youjun Chen [view email]
[v1]
Fri, 12 Jun 2026 04:02:20 UTC (2,310 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。