惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

U
Unit 42
S
Securelist
小众软件
小众软件
WordPress大学
WordPress大学
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
The GitHub Blog
The GitHub Blog
Apple Machine Learning Research
Apple Machine Learning Research
博客园 - 司徒正美
博客园 - Franky
Hugging Face - Blog
Hugging Face - Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
酷 壳 – CoolShell
酷 壳 – CoolShell
O
OpenAI News
Cloudbric
Cloudbric
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
TaoSecurity Blog
TaoSecurity Blog
MongoDB | Blog
MongoDB | Blog
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
V
V2EX
PCI Perspectives
PCI Perspectives
T
Troy Hunt's Blog
Schneier on Security
Schneier on Security
P
Palo Alto Networks Blog
M
MIT News - Artificial intelligence
V2EX - 技术
V2EX - 技术
阮一峰的网络日志
阮一峰的网络日志
Hacker News - Newest:
Hacker News - Newest: "LLM"
G
Google Developers Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
The Last Watchdog
The Last Watchdog
The Register - Security
The Register - Security
腾讯CDC
N
News and Events Feed by Topic
C
Check Point Blog
爱范儿
爱范儿
T
Tailwind CSS Blog
Webroot Blog
Webroot Blog
P
Proofpoint News Feed
S
Schneier on Security
MyScale Blog
MyScale Blog
N
News | PayPal Newsroom
Recorded Future
Recorded Future
T
Tenable Blog
I
InfoQ
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Microsoft Security Blog
Microsoft Security Blog
Simon Willison's Weblog
Simon Willison's Weblog
Engineering at Meta
Engineering at Meta

eess.AS updates on arXiv.org

Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes LiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operation Weight-Decay Turns Transformer Loss Landscapes Villani: Functional-Analytic Foundations for Optimization and Generalization PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling Predictive-Generative Drift Decomposition for Speech Enhancement and Separation Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions Phoneme-Level Deepfake Detection Across Emotional Conditions Using Self-Supervised Embeddings When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation Towards Improving Speaker Distance Estimation through Generative Impulse Response Augmentation Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe MMAudioReverbs: Video-Guided Acoustic Modeling for Dereverberation and Room Impulse Response Estimation Alethia: A Foundational Encoder for Voice Deepfakes From Birdsong to Rumbles: Classifying Elephant Calls with Out-of-Species Embeddings Beyond the Baseband: Adaptive Multi-Band Encoding for Full-Spectrum Bioacoustics Classification Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation DiffAnon: Diffusion-based Prosody Control for Voice Anonymization Recurrence-Based Nonlinear Vocal Dynamics as Digital Biomarkers for Depression Detection from Conversational Speech One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations Korean aegyo speech shows systematic F1 increase to signal childlike qualities All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation Speech Enhancement Based on Drifting Models Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling Explainable AI in Speaker Recognition -- Making Latent Representations Understandable TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions Dilated CNNs for Periodic Signal Processing: A Low-Complexity Approach ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization Audio2Tool: Speak, Call, Act -- A Dataset for Benchmarking Speech Tool Use Qwen3.5-Omni Technical Report Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning Model The Acoustic Camouflage Phenomenon: Re-evaluating Speech Features for Financial Risk Prediction SongBench: A Fine-Grained Multi-Aspect Benchmark for Song Quality Assessment VoxSafeBench: Not Just What Is Said, but Who, How, and Where In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word Level Timestamp Predictions MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models X-VC: Zero-shot Streaming Voice Conversion in Codec Space Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech Enhancing ASR Performance in the Medical Domain for Dravidian Languages PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing Disentangled Dual-Branch Graph Learning for Conversational Emotion Recognition OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models HARNESS: Lightweight Distilled Arabic Speech Foundation Models KoALa-Bench: Evaluating Large Audio Language Models on Korean Speech Understanding and Faithfulness Explainable Speech Emotion Recognition: Weighted Attribute Fairness to Model Demographic Contributions to Social Bias The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech Interpreting Speaker Characteristics in the Dimensions of Self-Supervised Speech Features From Black Box to Glass Box: Cross-Model ASR Disagreement to Prioto Review in Ambient AI Scribe Documentation WST-X Series: Wavelet Scattering Transform for Interpretable Speech Deepfake Detection Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering Diagnostic-Driven Layer-Wise Compensation for Post-Training Quantization of Encoder-Decoder ASR Models BERT-APC: A Reference-free Framework for Automatic Pitch Correction via Musical Context Inference MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models Sound Source Localization for Spatial Mapping of Surgical Actions in Dynamic Scenes Gelina: Unified Speech and Gesture Synthesis via Interleaved Token Prediction MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation VAPO: End-to-end Slide-Enhanced Speech Recognition with Omni-modal Large Language Models TokenChain: A Discrete Speech Chain via Semantic Token Modeling BaldWhisper: Faster Whisper with Head Shearing and Layer Merging Game-Time: Evaluating Temporal Dynamics in Spoken Language Models Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach Direct Simultaneous Translation Activation for Large Audio-Language Models CodecSep: Prompt-Driven Universal Sound Separation on Neural Audio Codec Latents Joint Learning using Mixture-of-Expert-Based Representation for Speech Enhancement and Robust Emotion Recognition DreamAudio: Customized Text-to-Audio Generation with Diffusion Models Computational Narrative Understanding for Expressive Text-to-Speech Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening Balalaika: Data-Centric, Prosody-Aware Annotation Pipeline for Russian Speech Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation Not that Groove: Zero-Shot Symbolic Music Editing Speculative End-Turn Detector for Efficient Speech Chatbot Assistant AudioX: A Unified Framework for Anything-to-Audio Generation S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models Throat and acoustic paired speech dataset for deep learning-based speech enhancement Dementia classification from spontaneous speech using wrapper-based feature selection DASB - Discrete Audio and Speech Benchmark Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks
Efficient Test-Time Adaptation through Latent Subspace Coefficients Search
Xinyu Luo, Jie Liu, Kecheng Chen, Junyi Yang, Bo Ding, Arindam B · 2025-10-13 · via eess.AS updates on arXiv.org

Real-world deployment often exposes models to distribution shifts, making test-time adaptation (TTA) critical for robustness. Yet most TTA methods are unfriendly to edge deployment, as they rely on backpropagation, activation buffering, or test-time mini-batches, leading to high latency and memory overhead. We propose \textbf{ELaTTA} (\textit{Efficient Latent Test-Time Adaptation}), a gradient-free framework for single-instance TTA under strict on-device constraints. ELaTTA freezes model weights and adapts each test sample by optimizing a low-dimensional coefficient vector in a source-induced principal latent subspace, pre-computed offline via truncated SVD and stored with negligible overhead. At inference, ELaTTA encourages prediction confidence by optimizing the $k$-D coefficients with CMA-ES, effectively optimizing a Gaussian-smoothed objective and improving stability near decision boundaries. Across six benchmarks and multiple architectures, ELaTTA achieves state-of-the-art accuracy under both strict and continual single-instance protocols, while reducing compute by up to \emph{63$\times$} and peak memory by up to \emph{11$\times$}. We further demonstrate on-device deployment on a ZYNQ-7020 platform.