惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Apple Machine Learning Research
Apple Machine Learning Research
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Jina AI
Jina AI
F
Fortinet All Blogs
有赞技术团队
有赞技术团队
月光博客
月光博客
爱范儿
爱范儿
U
Unit 42
B
Blog RSS Feed
aimingoo的专栏
aimingoo的专栏
P
Palo Alto Networks Blog
WordPress大学
WordPress大学
D
DataBreaches.Net
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
大猫的无限游戏
大猫的无限游戏
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园 - Franky
T
Threatpost
W
WeLiveSecurity
S
SegmentFault 最新的问题
Scott Helme
Scott Helme
C
Cisco Blogs
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Security Archives - TechRepublic
Security Archives - TechRepublic
G
Google Developers Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
博客园 - 聂微东
Forbes - Security
Forbes - Security
L
LINUX DO - 最新话题
Simon Willison's Weblog
Simon Willison's Weblog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Hacker News - Newest:
Hacker News - Newest: "LLM"
I
InfoQ
T
Tor Project blog
S
Security @ Cisco Blogs
Know Your Adversary
Know Your Adversary
MongoDB | Blog
MongoDB | Blog
Google Online Security Blog
Google Online Security Blog
P
Privacy & Cybersecurity Law Blog
Hugging Face - Blog
Hugging Face - Blog
C
CERT Recently Published Vulnerability Notes
N
News and Events Feed by Topic
博客园 - 叶小钗
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
博客园 - 司徒正美
V2EX - 技术
V2EX - 技术
Cisco Talos Blog
Cisco Talos Blog
Cloudbric
Cloudbric
Google DeepMind News
Google DeepMind News

cs.LG updates on arXiv.org

TOPCELL: Topology Optimization of Standard Cell via LLMs Calibrate-Then-Delegate: Safety Monitoring with Risk and Budget Guarantees via Model Cascades When Missing Becomes Structure: Intent-Preserving Policy Completion from Financial KOL Discourse Non-intrusive Learning of Physics-Informed Spatio-temporal Surrogate for Accelerating Design Asynchronous Probability Ensembling for Federated Disaster Detection Scouting By Reward: VLM-TO-IRL-Driven Player Selection For Esports Quantization of Spiking Neural Networks Beyond Accuracy An unsupervised decision-support framework for multivariate biomarker analysis in athlete monitoring Predicting Post-Traumatic Epilepsy from Clinical Records using Large Language Model Embeddings Material-Agnostic Zero-Shot Thermal Inference for Metal Additive Manufacturing via a Parametric PINN Framework Physics-Informed Machine Learning for Pouch Cell Temperature Estimation From Risk to Rescue: An Agentic Survival Analysis Framework for Liquidation Prevention Mean Flow Policy Optimization A Mechanistic Account of Attention Sinks in GPT-2: One Circuit, Broader Implications for Mitigation Expressivity of Transformers: A Tropical Geometry Perspective Assessing the Performance-Efficiency Trade-off of Foundation Models in Probabilistic Electricity Price Forecasting Constraint-based Pre-training: From Structured Constraints to Scalable Model Initialization Learning Ad Hoc Network Dynamics via Graph-Structured World Models Adaptive Test-Time Compute Allocation for Reasoning LLMs via Constrained Policy Optimization Curvature-Aligned Probing for Local Loss-Landscape Stabilization Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis xFODE+: Explainable Type-2 Fuzzy Additive ODEs for Uncertainty Quantification xFODE: An Explainable Fuzzy Additive ODE Framework for System Identification SegWithU: Uncertainty as Perturbation Energy for Single-Forward-Pass Risk-Aware Medical Image Segmentation An Analysis of Regularization and Fokker-Planck Residuals in Diffusion Models for Image Generation Class Unlearning via Depth-Aware Removal of Forget-Specific Directions Beyond Independent Frames: Latent Attention Masked Autoencoders for Multi-View Echocardiography When Fairness Metrics Disagree: Evaluating the Reliability of Demographic Fairness Assessment in Machine Learning STEP-Parts: Geometric Partitioning of Boundary Representations for Large-Scale CAD Processing Multi-User mmWave Beam and Rate Adaptation via Combinatorial Satisficing Bandits Beyond Importance Sampling: Rejection-Gated Policy Optimization Can LLMs Score Medical Diagnoses and Clinical Reasoning as well as Expert Panels? Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models SOLIS: Physics-Informed Learning of Interpretable Neural Surrogates for Nonlinear Systems Wasserstein Formulation of Reinforcement Learning. An Optimal Transport Perspective on Policy Optimization Catching Every Ripple: Enhanced Anomaly Awareness via Dynamic Concept Adaptation HAMSA: Scanning-Free Vision State Space Models via SpectralPulseNet Gating Enables Curvature: A Geometric Expressivity Gap in Attention Zeroth-Order Optimization at the Edge of Stability Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting Physically-Induced Atmospheric Adversarial Perturbations: Enhancing Transferability and Robustness in Remote Sensing Image Classification CMTM: Cross-Modal Token Modulation for Unsupervised Video Object Segmentation ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding Synthetic Tabular Generators Fail to Preserve Behavioral Fraud Patterns: A Benchmark on Temporal, Velocity, and Multi-Account Signals Automated co-design of high-performance thermodynamic cycles via graph-based hierarchical reinforcement learning Counterfactual Peptide Editing for Causal TCR--pMHC Binding Inference Binomial Gradient-Based Meta-Learning for Enhanced Meta-Gradient Estimation Enhancing Confidence Estimation in Telco LLMs via Twin-Pass CoT-Ensembling MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models Physics-informed reservoir characterization from bulk and extreme pressure events with a differentiable simulator Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding Multi-Task LLM with LoRA Fine-Tuning for Automated Cancer Staging and Biomarker Extraction Text-Attributed Knowledge Graph Enrichment with Large Language Models for Medical Concept Representation Selecting Feature Interactions for Generalized Additive Models by Distilling Foundation Models When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration BioTrain: Sub-MB, Sub-50mW On-Device Fine-Tuning for Edge-AI on Biosignals Linear Probe Accuracy Scales with Model Size and Benefits from Multi-Layer Ensembling Dataset-Level Metrics Attenuate Non-Determinism: A Fine-Grained Non-Determinism Evaluation in Diffusion Language Models WIN-U: Woodbury-Informed Newton-Unlearning as a retain-free Machine Unlearning Framework FAST: A Synergistic Framework of Attention and State-space Models for Spatiotemporal Traffic Prediction Adaptive Unknown Fault Detection and Few-Shot Continual Learning for Condition Monitoring in Ultrasonic Metal Welding Computational framework for multistep metabolic pathway design LEGO-MOF: Equivariant Latent Manipulation for Editable, Generative, and Optimizable MOF Design Learning Inference Concurrency in DynamicGate MLP Structural and Mathematical Justification Parameter-efficient Quantum Multi-task Learning Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges Self-Organizing Maps with Optimized Latent Positions A Bayesian Framework for Uncertainty-Aware Explanations in Power Quality Disturbance Classification Optimization with SpotOptim Physics-Informed Neural Networks for Solving Derivative-Constrained PDEs Character Beyond Speech: Leveraging Role-Playing Evaluation in Audio Large Language Models via Reinforcement Learning Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate Composite Silhouette: A Subsampling-based Aggregation Strategy RPS: Information Elicitation with Reinforcement Prompt Selection UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization Beyond State Consistency: Behavior Consistency in Text-Based World Models Simulation-Based Optimisation of Batting Order and Bowling Plans in T20 Cricket Hardware-Efficient Neuro-Symbolic Networks with the Exp-Minus-Log Operator Drowsiness-Aware Adaptive Autonomous Braking System based on Deep Reinforcement Learning for Enhanced Road Safety MolCryst-MLIPs: A Machine-Learned Interatomic Potentials Database for Molecular Crystals DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off Unsupervised Anomaly Detection in Process-Complex Industrial Time Series: A Real-World Case Study Quantum Machine Learning for Colorectal Cancer Data: Anastomotic Leak Classification and Risk Factors Provably Efficient Offline-to-Online Value Adaptation with General Function Approximation PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling Unsupervised domain transfer: Overcoming signal degradation in sleep monitoring by increasing scoring realism Physics-Informed Neural Networks for Methane Sorption: Cross-Gas Transfer Learning, Ensemble Collapse Under Physics Constraints, and Monte Carlo Dropout Uncertainty Quantification Complex Interpolation of Matrices with an application to Multi-Manifold Learning HUANet: Hard-Constrained Unrolled ADMM for Constrained Convex Optimization Fast Voxelization and Level of Detail for Microgeometry Rendering Structure- and Stability-Preserving Learning of Port-Hamiltonian Systems AeTHERON: Autoregressive Topology-aware Heterogeneous Graph Operator Network for Fluid-Structure Interaction Cross-Layer Co-Optimized LSTM Accelerator for Real-Time Gait Analysis Data-driven Learning of Probabilistic Model of Binary Droplet Collision for Spray Simulation Irregularly Sampled Time Series Interpolation for Binary Evolution Simulations Using Dynamic Time Warping EMGFlow: Robust and Efficient Surface Electromyography Synthesis via Flow Matching VIGILant: an automatic classification pipeline for glitches in the Virgo detector Reachability Constraints in Variational Quantum Circuits: Optimization within Polynomial Group Module
Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget
Michael O. Harding, Vikas Singh, Kirthevasan Kandasamy · 2026-02-20 · via cs.LG updates on arXiv.org

Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as medical studies or political polling, different sources incur different sampling costs. Observations often have associated group identities - for example, health markers, demographics, or political affiliations - and the relative composition of these groups may differ substantially, both among the source populations and between sources and target population. In this work, we study multi-source data collection under a fixed budget, focusing on the estimation of population means and group-conditional means. We show that naive data collection strategies (e.g. attempting to "match" the target distribution) or relying on standard estimators (e.g. sample mean) can be highly suboptimal. Instead, we develop a sampling plan which maximizes the effective sample size - the total sample size divided by $D_{χ^2}(q\mid\mid\overline{p}) + 1$, where $q$ is the target distribution, $\overline{p}$ is the aggregated source distribution, and $D_{χ^2}$ is the $χ^2$-divergence. We pair this sampling plan with a classical post-stratification estimator and upper bound its risk. We provide matching lower bounds, establishing that our approach achieves the budgeted minimax optimal risk. Our techniques also extend to prediction problems when minimizing the excess risk, providing a principled approach to multi-source learning with costly and heterogeneous data sources.