惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs.LG updates on arXiv.org

Balancing Fairness, Privacy, and Accuracy: A Multitask Adversarial Framework for Centralized Data-Driven Systems T2S-MPC: Time-Embedded Online Adaptive Model Predictive Control for Time-Varying Dynamics PILOT: Policy-Informed Learned Optimization for Adaptive Deep Network Training Aligning Molecular Graph Explanations with Chemical Identity via InChIfied Invariants MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding Assessing the Operational Viability of Foundation Models for Time Series Forecasting CAffNet: Hard Constraint-Affine Neural Networks A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation Riemannian Archetypal Analysis: Interpretable non-linear data analysis on deformed star distributions LAPLEX: The FFT of Learnable Laplace Kernels CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning Omissive Bias in Religious Representation: Benchmarking LLM Answers to Everyday Ethical Decision-making Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions Position: AI for Science Should Treat Measurement-to-Dataset Pipelines as Inference Components Feature Lottery? A Bifurcation Theory of Concept Emergence A computational phase transition for learning-to-sample from Ising models The Normalized Maximum Likelihood for Regular Non-Smooth Models: Measure-Theoretic Foundations and Geometric Sampling Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis Hermite-NGP: Gradient-Augmented Hash Encoding for Learning PDEs Hidden-State Privacy Has an Empty Middle AvAtar: Learning to Align via Active Optimal Transport A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood? Zeroth-Order Nonconvex Nonsmooth Optimization with Heavy-Tailed Noise ECHO: Terminal Agents Learn World Models for Free Private Adaptive Covariance Estimation via Gaussian Graphical Models Algometrics: Forecasting Under Algorithmic Feedback A lift for input-convex neural network training Not All Transitions Matter: Evidence from PPO ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale Active Learning for Stochastic Contextual Linear Bandits The Concept Allocation Zone: Tracking How Concepts Form Across Transformer Depth Tiny Brains, Giant Impact: Uncovering the Keystone Neurons of LLM with Just a Few Prompts Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers Trajectory-Based Difficulty Scoring for Reliable Learning on Tabular Data LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs Beyond Generative Priors: Minority Sampling with JEPA-Guided Diffusion Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection Streaming Reinforcement Learning under Partial Observability with Real-Time Recurrent Learning IterInject: Indirect Prompt Injection Against LLM Agents via Feedback-Guided Iterative Optimization Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion Bilevel Optimization of Synthetic Trajectories for Multi-Turn LLM Fine-Tuning Batch Normalization Amplifies Memorization and Privacy Risks TUBE: Tangent Upper Bound on Evidence for Discrete Diffusion Language Models Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning Reinforcement Learning for Reachability: Guaranteeing Asymptotic Optimality Disentangled Double Machine Learning for Accurate Causal Effect Estimation Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette Interdomain Attention: Beyond Token-Level Key-Value Memory Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning Momentum Streams for Optimizer-Inspired Transformers Muon in Vision Transformers: Optimizer-Recipe Interactions and Gradient Spectra Evolving Robustness--Exploration Trade-off in Online Reinforcement Learning via Quantile Bayesian Risk MDPs LLMs Show No Signs Of Individuated Metacognition Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation GEESE: Genotype-aware End-to-End Spatio-temporal Embedding for Behavioral Phenotyping Synheart Capacity: A Theory-Driven Physiological Representation of Cognitive Capacity Dynamics from Wearable Signals ChainzRule: Sample-Efficient, Robust Deep Learning Across Tabular, NLP, and Vision Tasks Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization Learning Laplacian Eigenspace with Mass-Aware Neural Operators on Point Clouds Representation-Guided Discrete Molecular Graph Retrosynthesis What Are We Actually Decoding? Source Attribution for Non-Invasive Brain-to-Language Retrieval Beyond Fixed Points: Superpolynomial Capacity of Asymmetric Hopfield Networks WLNO: Wavelet-Laplace Neural Operator for Solving Partial Differential Equations Hardware-Aware Federated Learning for Speech Emotion Recognition Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m Rethinking Federated Unlearning via the Lens of Memorization Towards Verifiable Transformers: Solver-Checkable Circuit Explanations Beyond the Aggregation Dilemma: Prior-Retaining Decoupled Learning for Multimodal Graphs RL with Learnable Textual Feedback: A Bilevel Approach Complement Submodular Information Measures for Balanced and Robust Data Selection Characterizing the Representational Capacity of Neural Processes High-fidelity Modeling of Full-scale Pressurized Water Reactor Flow Fields for Machine Learning Applications PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection Feature Learning in Wide Neural Networks under $μ$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference Generative OOD-regularized Model-based Policy Optimization Fourier Feature Pyramids for Physics-Informed Neural Networks ChainLearn: A Blockchain-Based Capacity-Aware Framework for Federated Ensemble Learning Refined Analysis of Entropy-Regularized Actor-Critic Optimizing Digital Therapeutic Interventions: Online Learning under Endogenous Adherence From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression Mixture of Complementary Agents for Robust LLM Ensemble DriftingMol: Decoder-Coupled Drift for One-Pass Property-Conditional Molecular Generation Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks The Perception-Physics Paradox: Probing Scientific Alignment with TC-Bench CAFD: Concept-Aware DNN Fault Detection using VLMs Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing Lake Detection and Water Quality Estimation in Sentinel-2 Data A Contractive Feedback Semantics for Reinforcement Learning An Effective-Rank Audit of Alignment-Induced Activation Shifts: Confound Control, Constructive Calibration, and Limits On the Stability and Realizability of Recurrent Polynomial Surrogate Ternary Logic Gate Networks Deep ZakaiJ: Structured Filtering for Jump-Diffusion Time Series Forecasting Treatment Effect Estimation with Differentiated Networked Effect on Graph Data Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions
The Model Parking Tax: Quantifying the Hidden Energy Cost of Always-On GPU Model Deployment
Sai Sathvik · 2026-05-26 · via cs.LG updates on arXiv.org

View PDF HTML (experimental)

Abstract:The AI inference industry keeps models loaded in GPU memory around the clock to avoid cold-start latency, implicitly treating idle power as a fixed cost of readiness. Yet the structure of this cost has never been empirically decomposed - and never across GPU architectures. We present the first cross-architecture measurement of idle GPU power as a function of VRAM allocation, combining 18 days of production telemetry (335,267 samples, 14 H100 GPUs) with controlled dose-response experiments on three GPU architectures spanning three memory technologies: NVIDIA H100 (HBM3, 80 GB), A100 (HBM2e, 80 GB), and L40S (GDDR6, 48 GB). We observe that idle power is piecewise constant on all three architectures: the CUDA context forces a discrete DVFS transition consuming +26-66 W over bare idle (26-50 W on HBM architectures, 66 W on GDDR6), while the marginal VRAM effect is bounded below measurement relevance ($|\beta| < 0.02$ W/GB) on every device tested. The CUDA context accounts for >98% of the parking tax regardless of memory technology. We validate this finding with a real HuggingFace model (Qwen2.5-7B) on all three architectures, confirming <0.5 W difference from empty tensors on every device, and capture cold-start power profiles during model loading. We derive a cold-start breakeven model showing energy-optimal behavior depends on request arrival rate and loading latency - not model size - with breakeven intervals of 1-5 minutes. Our results identify a constraint consistent across all tested architectures: idle-with-context power is determined by DVFS state, not memory occupancy.
Comments: 7 pages, 3 figures, 5 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Performance (cs.PF)
Cite as: arXiv:2605.23918 [cs.DC]
  (or arXiv:2605.23918v1 [cs.DC] for this version)
  https://doi.org/10.48550/arXiv.2605.23918

arXiv-issued DOI via DataCite

Submission history

From: Sai Sathvik Vadari [view email]
[v1] Wed, 15 Apr 2026 09:01:24 UTC (423 KB)