惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Microsoft Security Blog
Microsoft Security Blog
Forbes - Security
Forbes - Security
月光博客
月光博客
WordPress大学
WordPress大学
Last Week in AI
Last Week in AI
罗磊的独立博客
V
Visual Studio Blog
Help Net Security
Help Net Security
宝玉的分享
宝玉的分享
H
Heimdal Security Blog
The Last Watchdog
The Last Watchdog
V
V2EX - 技术
S
SegmentFault 最新的问题
爱范儿
爱范儿
C
Check Point Blog
GbyAI
GbyAI
L
LINUX DO - 最新话题
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
W
WeLiveSecurity
Martin Fowler
Martin Fowler
Google Online Security Blog
Google Online Security Blog
F
Fortinet All Blogs
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Google DeepMind News
Google DeepMind News
aimingoo的专栏
aimingoo的专栏
H
Hacker News: Front Page
M
MIT News - Artificial intelligence
T
Threatpost
IT之家
IT之家
AI
AI
P
Privacy & Cybersecurity Law Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
美团技术团队
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Stack Overflow Blog
Stack Overflow Blog
博客园 - 叶小钗
云风的 BLOG
云风的 BLOG
The Hacker News
The Hacker News
N
News and Events Feed by Topic
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
大猫的无限游戏
大猫的无限游戏
C
CXSECURITY Database RSS Feed - CXSecurity.com
S
Security Archives - TechRepublic
T
The Blog of Author Tim Ferriss
Cloudbric
Cloudbric
博客园_首页
Hugging Face - Blog
Hugging Face - Blog
G
GRAHAM CLULEY
V
V2EX
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知

cs.DC updates on arXiv.org

Agentic Performance at the Edge: Insights from Benchmarking Autonomous FAIR Digital Objects: From Passive Assertions to Active Knowledge DP-LAC: Lightweight Adaptive Clipping for Differentially Private Federated Fine-tuning of Language Models Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon From Detection to Recovery: Operational Analysis on LLM Pre-training with 504 GPUs DisagMoE: Computation-Communication overlapped MoE Training via Disaggregated AF-Pipe Parallelism FedGMI: Generative Model-Driven Federated Learning for Probabilistic Mixture Inference PAAC: Privacy-Aware Agentic Device-Cloud Collaboration Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction MARLaaS: Multi-Tenant Asynchronous Reinforcement Learning as a Service FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration Private Vertical Federated Inference for Time-Series Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation FLAM: Evaluating Model Performance with Aggregatable Measures in Federated Learning \mathsf{VISTA}: Decentralized Machine Learning in Adversary Dominated Environments UMEDA: Unified Multi-modal Efficient Data Fusion for Privacy-Preserving Graph Federated Learning via Spectral-Gated Attention and Diffusion-Based Operator Alignment SparseRL-Sync: Lossless Weight Synchronization with ~100x Less Communication Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning Execution Envelopes: A Shared Admission Contract for Backend AI Execution Requests Regulating Branch Parallelism in LLM Serving CLAD: A Clustered Label-Agnostic Federated Learning Framework for Joint Anomaly Detection and Attack Classification CCL-Bench 1.0: A Trace-Based Benchmark for LLM Infrastructure Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence VibeServe: Can AI Agents Build Bespoke LLM Serving Systems? Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend From Coordinate Matching to Structural Alignment: Rethinking Prototype Alignment in Heterogeneous Federated Learning Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving A Scalable Digital Twin Framework for Energy Optimization in Data Centers OpenG2G: A Simulation Platform for AI Datacenter-Grid Runtime Coordination Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs Resilient AI Supercomputer Networking using MRC and SRv6 A Workflow-Oriented Framework for Asynchronous Human-AI Collaboration in Hybrid and Compute-Intensive HPC Environments Pact: A Choreographic Language for Agentic Ecosystems From Barrier to Bridge: The Case for AI Data Center/Power Grid Co-Design SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications (POSTER) From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications parHSOM: A novel parallel Hierarchical Self-Organizing Map implementation Caliper-in-the-Loop: Black-Box Optimization for Hyperledger Fabric Performance Tuning FedPLT: Scalable, Resource-Efficient, and Heterogeneity-Aware Federated Learning via Partial Layer Training Privacy-Preserving Federated Learning: Integrating Zero-Knowledge Proofs in Scalable Distributed Architectures Heterogeneous Model Fusion for Privacy-Aware Multi-Camera Surveillance via Synthetic Domain Adaptation SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM Inference FedQueue: Queue-Aware Federated Learning for Cross-Facility HPC Training Stochastic Sparse Attention for Memory-Bound Inference AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving VUDA: Breaking CUDA-Vulkan Isolation for Spatial Sharing of Compute and Graphics on the Same GPU Intelligent Autonomous Orchestration for Distributed Cloud Resources using Complex-Stability Analysis Position: LLM Serving Needs Mathematical Optimization and Algorithmic Foundations, Not Just Heuristics SURGE: SuperBatch Unified Resource-efficient GPU Encoding for Heterogeneous Partitioned Data AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks Adaptation of AI-accelerated CFD Simulations to the IPU platform Hierarchical Federated Learning for Networked AI: From Communication Saving to Architecture-Aware Design Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference Network Digital Untwinning: Towards Backward Optimization of Digital Twins AI Inference as Relocatable Electricity Demand: A Latency-Constrained Energy-Geography Framework ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training Autonomous Systems Dependability in the era of AI: Design Challenges in Safety, Security, Reliability and Certification AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism Efficient Training on Multiple Consumer GPUs with RoundPipe FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving Scaling Mobile Agent Systems: From Capability Density to Collective Intelligence DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference FloatSOM: GPU-Accelerated, Distributed, Topology-Flexible Self-Organizing Maps Progressive Semantic Communication for Efficient Edge-Cloud Vision-Language Models SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning Efficient, VRAM-Constrained xLM Inference on Clients Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving RaMP: Runtime-Aware Megakernel Polymorphism for Mixture-of-Experts Performance and Energy Trade-Off Analysis of Hierarchical Federated Learning for Plant Disease Classification Spark Policy Toolkit: Semantic Contracts and Scalable Execution for Policy Learning in Spark Internet of Everything in the 6G Era: Paradigms, Enablers, Potentials and Future Directions PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy Optimizations ITAS: A Multi-Agent Architecture for LLM-Based Intelligent Tutoring Latency and Cost of Multi-Agent Intelligent Tutoring at Scale TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training FreeScale: Distributed Training for Sequence Recommendation Models with Minimal Scaling Cost CommFuse: Hiding Tail Latency via Communication Decomposition and Fusion for Distributed LLM Training A Taxonomy and Resolution Strategy for Client-Level Disagreements in Federated Learning Usable Agent Discovery for Decentralized AI Systems Cloud to Edge: Benchmarking LLM Inference On Hardware-Accelerated Single-Board Computers Data-Free Contribution Estimation in Federated Learning using Gradient von Neumann Entropy Shard the Gradient, Scale the Model: Serverless Federated Aggregation via Gradient Partitioning Promoting Simple Agents: Ensemble Methods for Event-Log Prediction GraphLeap: Decoupling Graph Construction and Convolution for Vision GNN Acceleration on FPGA AGNT2: Autonomous Agent Economies on Interaction-Optimized Layer 2 Infrastructure FedSIR: Spectral Client Identification and Relabeling for Federated Learning with Noisy Labels Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling A Delta-Aware Orchestration Framework for Scalable Multi-Agent Edge Computing Federated Learning over Blockchain-Enabled Cloud Infrastructure
Hybrid-Parallel: Achieving High Performance and Energy Efficient Distributed Inference on Robots
Zekai Sun, Xiuxian Guan, Junming Wang, Haoze Song, Yuhao Qing, T · 2024-05-30 · via cs.DC updates on arXiv.org

The rapid advancements in machine learning techniques have led to significant achievements in various real-world robotic tasks. These tasks heavily rely on fast and energy-efficient inference of deep neural network (DNN) models when deployed on robots. To enhance inference performance, distributed inference has emerged as a promising approach, parallelizing inference across multiple powerful GPU devices in modern data centers using techniques such as data parallelism, tensor parallelism, and pipeline parallelism. However, when deployed on real-world robots, existing parallel methods fail to provide low inference latency and meet the energy requirements due to the limited bandwidth of robotic IoT. We present Hybrid-Parallel, a high-performance distributed inference system optimized for robotic IoT. Hybrid-Parallel employs a fine-grained approach to parallelize inference at the granularity of local operators within DNN layers (i.e., operators that can be computed independently with the partial input, such as the convolution kernel in the convolution layer). By doing so, Hybrid-Parallel enables different operators of different layers to be computed and transmitted concurrently, and overlap the computation and transmission phases within the same inference task. The evaluation demonstrate that Hybrid-Parallel reduces inference time by 14.9% ~41.1% and energy consumption per inference by up to 35.3% compared to the state-of-the-art baselines.