惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
Tailwind CSS Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
S
SegmentFault 最新的问题
U
Unit 42
C
Cyber Attacks, Cyber Crime and Cyber Security
Security Latest
Security Latest
L
LINUX DO - 最新话题
The Register - Security
The Register - Security
人人都是产品经理
人人都是产品经理
美团技术团队
PCI Perspectives
PCI Perspectives
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
W
WeLiveSecurity
F
Full Disclosure
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Cloudbric
Cloudbric
L
LangChain Blog
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
M
MIT News - Artificial intelligence
S
Security @ Cisco Blogs
博客园 - 【当耐特】
Webroot Blog
Webroot Blog
Stack Overflow Blog
Stack Overflow Blog
C
Check Point Blog
Help Net Security
Help Net Security
NISL@THU
NISL@THU
WordPress大学
WordPress大学
Simon Willison's Weblog
Simon Willison's Weblog
月光博客
月光博客
C
CERT Recently Published Vulnerability Notes
博客园 - 三生石上(FineUI控件)
S
Securelist
博客园 - Franky
博客园 - 叶小钗
AWS News Blog
AWS News Blog
D
DataBreaches.Net
P
Proofpoint News Feed
小众软件
小众软件
C
Cybersecurity and Infrastructure Security Agency CISA
Hugging Face - Blog
Hugging Face - Blog
Engineering at Meta
Engineering at Meta
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
H
Hackread – Cybersecurity News, Data Breaches, AI and More
The GitHub Blog
The GitHub Blog
K
Kaspersky official blog
Vercel News
Vercel News
Google Online Security Blog
Google Online Security Blog
C
Cisco Blogs
S
Security Affairs

cs.SD updates on arXiv.org

Probing Token Spaces under Generator Shift in AI-Generated Music Detection Acoustic scattering AI for non-invasive object classifications: A case study on hair assessment Genre Controlled Music Generation via Activation Steering Addressing Pitfalls in Auditing Practices of Automatic Speech Recognition Technologies: A Case Study of People with Aphasia MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation Semantic-Aware Interpretable Multimodal Music Auto-Tagging Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio Not that Groove: Zero-Shot Symbolic Music Editing Histogram-based Parameter-efficient Tuning for Passive and Active Sonar Classification Speculative End-Turn Detector for Efficient Speech Chatbot Assistant AudioX: A Unified Framework for Anything-to-Audio Generation S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data DeePen: Penetration Testing for Audio Deepfake Detection Audio-FLAN: An Instruction-Following Dataset for Unified Audio Understanding and Generation of Speech, Music, and Sound Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization Throat and acoustic paired speech dataset for deep learning-based speech enhancement XAttnMark: Learning Robust Audio Watermarking with Cross-Attention Dementia classification from spontaneous speech using wrapper-based feature selection Modality-Inconsistent Continual Learning of Multimodal Large Language Models Repurposing Image Diffusion Models for Training-Free Music Style Transfer on Mel-spectrograms DASB - Discrete Audio and Speech Benchmark Benchmarking Cross-Domain Audio-Visual Deception Detection Developing a Multi-variate Prediction Model For COVID-19 From Crowd-sourced Respiratory Voice Data Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation Deep Neural Network for Musical Instrument Recognition using MFCCs Deep Within-Class Covariance Analysis for Robust Audio Representation Learning Framework for evaluation of sound event detection in web videos Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention Listening to the World Improves Speech Command Recognition Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning Generating Nontrivial Melodies for Music as a Service Research on several key technologies in practical speech emotion recognition MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment A Categorical Approach for Recognizing Emotional Effects of Music Capturing Long-term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition Learning Musical Relations using Gated Autoencoders An Improved Residual LSTM Architecture for Acoustic Modeling Generative Statistical Models with Self-Emergent Grammar of Chord Sequences Comparison of Multiple Features and Modeling Methods for Text-dependent Speaker Verification Speaker Identification in each of the Neutral and Shouted Talking Environments based on Gender-Dependent Approach Using SPHMMs A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification Learning and Evaluating Musical Features with Deep Autoencoders Monaural Audio Speaker Separation with Source Contrastive Estimation Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation Transfer learning for music classification and regression tasks Note Value Recognition for Piano Transcription Using Markov Random Fields Sound-Word2Vec: Learning Word Representations Grounded in Sounds Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices Lyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive Patterns in Vowel Acoustics Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition SampleRNN: An Unconditional End-to-End Neural Audio Generation Model Imposing higher-level Structure in Polyphonic Music Generation using Convolutional Restricted Boltzmann Machines and Constraints A Unit Selection Methodology for Music Generation Using Deep Neural Networks Algorithmic Songwriting with ALYSIA DeepBach: a Steerable Model for Bach Chorales Generation Learning Filter Banks Using Deep Learning For Acoustic Signals Composing Music with Grammar Argumented Neural Networks and Note-Level Encoding Song From PI: A Musically Plausible Network for Pop Music Generation Maximum entropy models for generation of expressive music Weakly Supervised PLDA Training Decision Making Based on Cohort Scores for Speaker Verification Discovering Sound Concepts and Acoustic Relations In Text Style Imitation and Chord Invention in Polyphonic Music with Exponential Families Inpainting of long audio segments with similarity graphs Explaining Deep Convolutional Neural Networks on Music Classification CaR-FOREST: Joint Classification-Regression Decision Forests for Overlapping Audio Event Detection Fractal Dimension Pattern Based Multiresolution Analysis for Rough Estimator of Person-Dependent Audio Emotion Recognition Label Tree Embeddings for Acoustic Scene Classification Polymetric Rhythmic Feel for a Cognitive Drum Computer The "Horse'' Inside: Seeking Causes Behind the Behaviours of Music Content Analysis Systems Symbolic Music Data Version 1.0 Towards Playlist Generation Algorithms Using RNNs Trained on Within-Track Transitions Audio Event Detection using Weakly Labeled Data An Argument-based Creative Assistant for Harmonic Blending Wavelet Scattering on the Pitch Spiral Sports highlights generation based on acoustic events detection: A rugby case study Emotion Analysis of Songs Based on Lyrical and Audio Features Modeling State-Conditional Observation Distribution using Weighted Stereo Samples for Factorial Speech Processing Models Plagiarism Detection in Polyphonic Music using Monaural Signal Separation Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation Computoser - rule-based, probability-driven algorithmic music composition Automatic Fado Music Classification Music and Vocal Separation Using Multi-Band Modulation Based Features A Stochastic Temporal Model of Polyphonic MIDI Performance with Ornaments Outer-Product Hidden Markov Model and Polyphonic MIDI Score Following Phoneme discrimination using KS algebra I Beyond Markov Chains, Towards Adaptive Memristor Network-based Music Generation A Mixed Graphical Model for Rhythmic Parsing An Approach for Classification of Dysfluent and Fluent Speech Using K-NN And SVM Evolving Musical Counterpoint: The Chronopoint Musical Evolution System An end-to-end machine learning system for harmonic analysis of music On Macroscopic Complexity and Perceptual Coding Particle Filtering on the Audio Localization Manifold Inter Genre Similarity Modelling For Automatic Music Genre Classification
Leveraging Local and Global Knowledge Integration with Time-Frequency Calibrated Distillation for Speech Enhancement
Jiaming Cheng, Ruiyu Liang, Ye Ni, Chao Xu, Jing Li, Wei Zhou, R · 2025-06-16 · via cs.SD updates on arXiv.org

In this paper, we propose an intra-set and inter-set recursive fusion framework with time-frequency calibrated knowledge distillation (I$^2$SRF-TFCKD) for SE. Different from previous distillation strategies for SE, the proposed framework fully exploits the time-frequency differential information of speech while facilitating both local information focusing and global knowledge circulation. Firstly, we construct a collaborative distillation paradigm for intra-set and inter-set correlations. Within a correlated set, multi-layer teacher-student features are pairwise matched for calibrated distillation. Subsequently, we generate representative features from each correlated set through recursive fusion to form the fused feature set that enables inter-set knowledge interaction. Secondly, we propose a multi-layer interactive distillation based on dual-stream time-frequency cross-calibration, which calculates the teacher-student similarity calibration weights in the time and frequency domains respectively and performs cross-weighting, thus enabling refined allocation of distillation contributions across different layers according to speech characteristics. The proposed distillation strategy is applied to the dual-path dilated convolutional recurrent network (DPDCRN) that ranked first in the SE track of the L3DAS23 challenge. To evaluate the effectiveness of I$^2$SRF-TFCKD, we conduct experiments on both single-channel and multi-channel SE datasets. Objective evaluations demonstrate that the proposed KD strategy consistently and effectively improves the performance of the low-complexity student model and outperforms other distillation schemes.