惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

博客园 - 叶小钗
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
MongoDB | Blog
MongoDB | Blog
V
Visual Studio Blog
Security Archives - TechRepublic
Security Archives - TechRepublic
Jina AI
Jina AI
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
S
Secure Thoughts
Simon Willison's Weblog
Simon Willison's Weblog
博客园_首页
T
Threat Research - Cisco Blogs
Attack and Defense Labs
Attack and Defense Labs
H
Heimdal Security Blog
L
Lohrmann on Cybersecurity
爱范儿
爱范儿
Stack Overflow Blog
Stack Overflow Blog
Last Week in AI
Last Week in AI
T
Troy Hunt's Blog
C
CERT Recently Published Vulnerability Notes
P
Proofpoint News Feed
小众软件
小众软件
Security Latest
Security Latest
F
Fortinet All Blogs
Vercel News
Vercel News
博客园 - 司徒正美
C
Cisco Blogs
T
Tailwind CSS Blog
Recorded Future
Recorded Future
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Latest news
Latest news
V
Vulnerabilities – Threatpost
S
Schneier on Security
Forbes - Security
Forbes - Security
www.infosecurity-magazine.com
www.infosecurity-magazine.com
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
The Last Watchdog
The Last Watchdog
G
GRAHAM CLULEY
D
Darknet – Hacking Tools, Hacker News & Cyber Security
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Microsoft Azure Blog
Microsoft Azure Blog
Google DeepMind News
Google DeepMind News
The Register - Security
The Register - Security
博客园 - 三生石上(FineUI控件)
O
OpenAI News
F
Full Disclosure
L
LINUX DO - 热门话题
Help Net Security
Help Net Security
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园 - Franky

cs.DC updates on arXiv.org

Multi-Round Visibility: A Post-Consensus Ordering Layer for DAG-Based BFT AlignedServe: Orchestrating Prefix-aware Batching to Build a High-throughput and Computing-efficient LLM Serving System XWind: A Cross-site Router for Large Language Model Inference Serving at Renewable Energy Farms Ontological Knowledge Blocks: Executable Compliance and Profile-Based Validation for Trustworthy AI Systems SolarChain: Bridging Physical Law, Verifiable Trust, and Sustainable Markets for Urban Energy Resilience Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems Orbax: Distributed Checkpointing with JAX Asymmetric Virtual Memory Paging for Hybrid Mamba-Transformer Inference SepsisAI Orchestrator: A Containerized and Scalable Platform for Deploying AI Models and Real-Time Monitoring in Early Sepsis Detection Secure and Parallel Determinant Computation for Large-Scale Matrices in Edge Environments Budgeted Dynamic Trace Structures for Token-Efficient Sequential Computation PALS: Power-Aware LLM Serving for Mixture-of-Experts Models Frontier: Towards Comprehensive and Accurate LLM Inference Simulation High-speed Networking for Giga-Scale AI Factories Automated Byzantine-Resilient Clustered Decentralized Federated Learning for Battery Intelligence in Connected EVs Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR Instant GPU Efficiency Visibility at Fleet Scale Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems Deep Tech to Space: Space Data Centers and AI Revolution at the Edge Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption Resilient Byzantine Agreement with Predictions LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications Heterogeneous Tasks Offloading in Vehicular Edge Computing: A Federated Meta Deep Reinforcement Learning Approach Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training Guard: Scalable Straggler Detection and Node Health Management for Large-Scale Training TierCheck: Tiered Checkpointing for Fault Tolerance in Large Language Model Training OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization Data-Free Client Contribution Estimation via Logit Maximization for Federated Learning Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference ObjectCache: Layerwise Object-Storage Retrieval for KV Cache Reuse S-Bus: Automatic Read-Set Reconstruction for Multi-Agent LLM State Coordination Conflict-Free Replicated Data Types for Neural Network Model Merging: A Two-Layer Architecture Enabling CRDT-Compliant Model Merging Across 26 Strategies Designing Datacenter Power Delivery Hierarchies for the AI Era Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training ADAPT: A Self-Calibrating Proactive Autoscaler for Container Orchestration A Few GPUs, A Whole Lotta Scale: Faithful LLM Training Emulation with PrismLLM On the Fragility of Data Attribution When Learning Is Distributed APWA: A Distributed Architecture for Parallelizable Agentic Workflows EMA: Efficient Model Adaptation for Learning-based Systems MinT: Managed Infrastructure for Training and Serving Millions of LLMs Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving DisAgg: Distributed Aggregators for Efficient Secure Aggregation in Federated Learning MARLIN: Multi-Agent Game-Theoretic Reinforcement Learning for Sustainable LLM Inference in Cloud Datacenters INAR-VL: Input-Aware Routing for Edge-Cloud Vision-Language Inference Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity TurboGR: An Accelerated Training System for Large-Scale Generative Recommendation Constitutional Governance in Metric Spaces Hierarchical Transformer Preconditioning for Interactive Physics Simulation Parallel-in-Time Training of Recurrent Neural Networks for Dynamical Systems Reconstruction Not All Tokens Are Worth Caching: Learning Semantic-Aware Eviction for LLM Prefix Caches The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training Trade-offs in Decentralized Agentic AI Discovery Across the Compute Continuum ChunkFlow: Communication-Aware Chunked Prefetching for Layerwise Offloading in Distributed Diffusion Transformer Inference MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload ShardTensor: Domain Parallelism for Scientific Machine Learning Agentic Performance at the Edge: Insights from Benchmarking Autonomous FAIR Digital Objects: From Passive Assertions to Active Knowledge DP-LAC: Lightweight Adaptive Clipping for Differentially Private Federated Fine-tuning of Language Models BatchWeave: A Consistent Object-Store-Native Data Plane for Large Foundation Model Training Kelvin v1.0: A Neural Pre-Encoder for H.264: A standards-compliant learned preprocessor with -27.62% BD-VMAF on UVG Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon From Detection to Recovery: Operational Analysis on LLM Pre-training with 504 GPUs DisagMoE: Computation-Communication overlapped MoE Training via Disaggregated AF-Pipe Parallelism FedGMI: Generative Model-Driven Federated Learning for Probabilistic Mixture Inference PAAC: Privacy-Aware Agentic Device-Cloud Collaboration Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction MARLaaS: Multi-Tenant Asynchronous Reinforcement Learning as a Service FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration Private Vertical Federated Inference for Time-Series Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation FLAM: Evaluating Model Performance with Aggregatable Measures in Federated Learning \mathsf{VISTA}: Decentralized Machine Learning in Adversary Dominated Environments UMEDA: Unified Multi-modal Efficient Data Fusion for Privacy-Preserving Graph Federated Learning via Spectral-Gated Attention and Diffusion-Based Operator Alignment SparseRL-Sync: Lossless Weight Synchronization with ~100x Less Communication Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning Execution Envelopes: A Shared Admission Contract for Backend AI Execution Requests Regulating Branch Parallelism in LLM Serving CLAD: A Clustered Label-Agnostic Federated Learning Framework for Joint Anomaly Detection and Attack Classification CCL-Bench 1.0: A Trace-Based Benchmark for LLM Infrastructure Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence VibeServe: Can AI Agents Build Bespoke LLM Serving Systems? Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend From Coordinate Matching to Structural Alignment: Rethinking Prototype Alignment in Heterogeneous Federated Learning Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving A Scalable Digital Twin Framework for Energy Optimization in Data Centers OpenG2G: A Simulation Platform for AI Datacenter-Grid Runtime Coordination Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs Resilient AI Supercomputer Networking using MRC and SRv6 A Workflow-Oriented Framework for Asynchronous Human-AI Collaboration in Hybrid and Compute-Intensive HPC Environments
Scheduled Jacobian Chaining
Simon Märtens, Uwe Naumann · 2025-05-09 · via cs.DC updates on arXiv.org

This paper addresses the efficient computation of Jacobian matrices for programs composed of sequential differentiable subprograms. By representing the overall Jacobian as a chain product of the Jacobians of these subprograms, we reduce the problem to optimizing the sequence of matrix multiplications, known as the Jacobian Matrix Chain Product problem. Solutions to this problem yield "optimal bracketings", which induce a precedence-constraint scheduling problem. We investigate the inherent parallelism in the solutions and develop a new dynamic programming algorithm as a heuristic that incorporates the scheduling. To assess its performance, we benchmark it against the global optimum, which is computed via a branch-and-bound algorithm.