




















Abstract:In this study, we focus on cough-based tuberculosis screening (CBTS) and hypothesize that fusing speech/audio foundation representations with spectral descriptors will yield stronger screening performance. We expect this fusion to reveal complementary strengths: spectral features preserve fine-grained short-time acoustic detail in cough signals, while foundation embeddings capture higher-level temporal and event-level patterns learned from large-scale pretraining. To this end, we propose COBALT, a novel fusion framework based on codebook-aligned hyperbolic prototypes and bandit-style reliability weighting to integrate heterogeneous representations effectively. Using the CODA TB DREAM Challenge benchmark, COBALT consistently outperforms individual representations and a concatenation baseline, achieving the best overall performance when fusing MFCC with PaSST thereby establishing a new state-of-the-art on the benchmark.
From: Mohd Akhtar Mujtaba [view email]
[v1]
Mon, 15 Jun 2026 22:37:26 UTC (2,065 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。