慣性聚合 高效追蹤和閱讀你感興趣的部落格、新聞、科技資訊
閱讀原文 在慣性聚合中打開

推薦訂閱源

cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
P
Proofpoint News Feed
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Google DeepMind News
Google DeepMind News
T
The Blog of Author Tim Ferriss
T
Tor Project blog
T
Threatpost
V
Vulnerabilities – Threatpost
大猫的无限游戏
大猫的无限游戏
量子位
Scott Helme
Scott Helme
Schneier on Security
Schneier on Security
有赞技术团队
有赞技术团队
Recent Commits to openclaw:main
Recent Commits to openclaw:main
李成银的技术随笔
K
Kaspersky official blog
T
ThreatConnect
美团技术团队
博客园 - Franky
爱范儿
爱范儿
A
Arctic Wolf
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
SecWiki News
SecWiki News
Microsoft Azure Blog
Microsoft Azure Blog
博客园 - 叶小钗
Recorded Future
Recorded Future
L
Lohrmann on Cybersecurity
J
Java Code Geeks
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
D
DataBreaches.Net
Spread Privacy
Spread Privacy
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Comments on: Blog
B
Blog RSS Feed
L
LINUX DO - 热门话题
阮一峰的网络日志
阮一峰的网络日志
腾讯CDC
酷 壳 – CoolShell
酷 壳 – CoolShell
N
Netflix TechBlog - Medium
S
SegmentFault 最新的问题
S
Security @ Cisco Blogs
Latest news
Latest news
I
InfoQ
Project Zero
Project Zero
P
Privacy International News Feed
D
Docker
The Hacker News
The Hacker News
A
About on SuperTechFans

cs.LG updates on arXiv.org

Personalized Generative Models for Contextual Debiasing From Privacy to Generalization: Linear Max-Information Bounds for DP-SGD When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control Amortized Factor Inference Networks for Posterior Inference Classification and detection of multiple UAVs using rational Gaussian wavelet neural networks Planning Neural Dynamics with Lie Group Embedding through Supervised Projective Manifold Learning Modeling Dynamic Mixtures of Time-Delay Systems from Streaming Time Series AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion Neural Bayesian Sequential Routing GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training Provably Communication-Efficient and Privacy-Preserving Federated Graph Neural Networks Function-Valued Causal Influence in Nonlinear Time Series The Bridge-Garden Dilemma in LLM Distillation: Why Mixing Hard and Soft Labels Works Balancing Plasticity and Stability with Fast and Slow Successor Features InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization FM-fMRI: Event Conditioned Flow Matching for Rest-to-Task fMRI Time-Series Synthesis TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach CSV-ViT: A Vision Transformer with the Variable-sized Cortical Supervertices for Detection of Alzheimer's Disease Pathologies Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback Curriculum Learning for Safety Alignment A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning Dynamic Link Prediction with Temporally Enhanced Signed Graph Neural Networks GEM: Geometric Entropy Mixing for Optimal LLM Data Curation Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage Unified Neural Scaling Laws Semigroup Consistency as a Diagnostic for Learned Physics Simulators QAM-W: Joint 2D Codebook Quantization for LLM Weights via Hadamard Rotation and Activation-Aware Scaling HRVConformer: Neonatal Hypoxic-Ischemic Encephalopathy Classification from the Heart Rate signals Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection SilIF: Silhouette-Augmented Isolation Forest for Unsupervised Transaction Fraud Detection Co-folding model guided by structural proteomics Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention Stateful Inference for Low-Latency Multi-Agent Tool Calling Two-Parameter Flows for Learning Population Dynamics of Physical Systems On the Role of Inductive Bias in Time-Series Pretraining: A Case Study in Learning Generalizable Representations for Clinical Time Series A Fast and Generic Energy-Shifting Transformer for Hybrid Monte Carlo Radiotherapy Calculation ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling When Rule Violations Are Rare: Chimera Training for Logical Anomaly Detection The Constraint Tax: Measuring Validity-Correctness Tradeoffs in Structured Outputs for Small Language Models When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding
PHALAR:用於學習音樂音訊表現的相量
Davide Marin · 2026-05-06 · via cs.LG updates on arXiv.org

查看 PDF HTML (實驗性)

摘要:字根检索,即將缺失的字根與給定的音頻子混音進行匹配的任務,是目前受限於拋棄時間信息的模型的一個關鍵挑戰。我們介紹了 PHALAR,一個對比性框架,它在超越最先進技術的同時,實現了相對準確性提升最高達 $\approx 70\%$,而所需的<50\%$ 的參數和 7$\times$ 的訓練速度提升。透過利用一個學習頻譜池化層和一個複數值頭部,PHALAR 強制實施音高等變和相位等變偏差。PHALAR 在 MoisesDB、Slakh 和 ChocoChorales 上建立了新的檢索狀態最佳,與人類協調判斷相關顯著高於語義基線。最後,零樣本節拍追蹤和線性和弦探測確認 PHALAR 捕捉了超出檢索任務的強健音樂結構。
評論: 於 ICML 2026 接受
主題: 聲音 (cs.SD); 人工智慧 (cs.AI); 機器學習 (cs.LG); 信號處理 (eess.SP)
引用格式: arXiv:2605.03929 [cs.SD]
  (或 arXiv:2605.03929v4 [cs.SD]) for this version)
  https://doi.org/10.48550/arXiv.2605.03929

arXiv-發行的 DOI 透過 DataCite

提交通過歷

From: Davide Marincione [查看郵件]
[v1] 周二,2026年5月5日 16:19:58 UTC (3,943 KB)
[v2] Wed, 6 May 2026 09:27:42 UTC (3,940 KB)
[v3] Sat, 9 May 2026 11:18:22 UTC (3,943 KB)
[v4] Tue, 26 May 2026 17:01:23 UTC (4,418 KB)