惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Y
Y Combinator Blog
博客园 - 司徒正美
TaoSecurity Blog
TaoSecurity Blog
Martin Fowler
Martin Fowler
T
Threat Research - Cisco Blogs
Blog — PlanetScale
Blog — PlanetScale
S
Secure Thoughts
博客园 - 三生石上(FineUI控件)
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
K
Kaspersky official blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Cisco Talos Blog
Cisco Talos Blog
H
Help Net Security
博客园 - 叶小钗
爱范儿
爱范儿
GbyAI
GbyAI
I
Intezer
M
MIT News - Artificial intelligence
Latest news
Latest news
Schneier on Security
Schneier on Security
T
Tor Project blog
Simon Willison's Weblog
Simon Willison's Weblog
I
InfoQ
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
C
CXSECURITY Database RSS Feed - CXSecurity.com
罗磊的独立博客
N
News and Events Feed by Topic
T
The Blog of Author Tim Ferriss
V2EX - 技术
V2EX - 技术
B
Blog
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Security Latest
Security Latest
V
V2EX
F
Fortinet All Blogs
Forbes - Security
Forbes - Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
The Hacker News
The Hacker News
Scott Helme
Scott Helme
P
Privacy International News Feed
P
Palo Alto Networks Blog
H
Heimdal Security Blog
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
博客园 - Franky
酷 壳 – CoolShell
酷 壳 – CoolShell
G
Google Developers Blog
W
WeLiveSecurity
L
LINUX DO - 最新话题

cs.AI updates on arXiv.org

Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning Do LLMs Know Tool Irrelevance? Demystifying Structural Alignment Bias in Tool Invocations Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method CocoaBench: Evaluating Unified Digital Agents in the Wild MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis Efficient Training for Cross-lingual Speech Language Models Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities Uncertainty-Aware Web-Conditioned Scientific Fact-Checking When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies When Verification Fails: How Compositionally Infeasible Claims Escape Rejection Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series TInR: Exploring Tool-Internalized Reasoning in Large Language Models Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction Generating Multiple-Choice Knowledge Questions with Interpretable Difficulty Estimation using Knowledge Graphs and Large Language Models Deep-Reporter: Deep Research for Grounded Multimodal Long-Form Generation Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models Learning and Enforcing Context-Sensitive Control for LLMs Efficient Process Reward Modeling via Contrastive Mutual Information Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models LLMs Should Incorporate Explicit Mechanisms for Human Empathy ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning Think in Sentences: Explicit Sentence Boundaries Enhance Language Model's Capabilities CircuitSynth: Reliable Synthetic Data Generation ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models Computational Implementation of a Model of Category-Theoretic Metaphor Comprehension CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models Should We be Pedantic About Reasoning Errors in Machine Translation? GIANTS: Generative Insight Anticipation from Scientific Literature SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning Many-Tier Instruction Hierarchy in LLM Agents Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories PhysInOne: Visual Physics Learning and Reasoning in One Suite Neural Distribution Prior for LiDAR Out-of-Distribution Detection Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition PDE-regularized Dynamics-informed Diffusion with Uncertainty-aware Filtering for Long-Horizon Dynamics Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA Identification and Anonymization of Named Entities in Unstructured Information Sources for Use in Social Engineering Detection Hypergraph Neural Networks Accelerate MUS Enumeration ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering Neighbourhood Transformer: Switchable Attention for Monophily-Aware Graph Learning Aligned Agents, Biased Swarm: Measuring Bias Amplification in Multi-Agent Systems WOMBET: World Model-Based Experience Transfer for Robust and Sample-efficient Reinforcement Learning Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS) Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective A Mathematical Framework for Temporal Modeling and Counterfactual Policy Simulation of Student Dropout Temporal Dropout Risk in Learning Analytics: A Harmonized Survival Benchmark Across Dynamic and Early-Window Representations MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classification Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs SenBen: Sensitive Scene Graphs for Explainable Content Moderation eBandit: Kernel-Driven Reinforcement Learning for Adaptive Video Streaming Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring Deep Learning-Based Tracking and Lineage Reconstruction of Ligament Breakup Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding On Semiotic-Grounded Interpretive Evaluation of Generative Art Evidential Transformation Network: Turning Pretrained Models into Evidential Models for Post-hoc Uncertainty Estimation QARIMA: A Quantum Approach To Classical Time Series Analysis StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines Joint Interference Detection and Identification via Adversarial Multi-task Learning Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception From Dispersion to Attraction: Spectral Dynamics of Hallucination Across Whisper Model Scales AlphaLab: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers Multivariate Time Series Anomaly Detection via Dual-Branch Reconstruction and Autoregressive Flow-based Residual Density Estimation On the Spectral Geometry of Cross-Modal Representations: A Functional Map Diagnostic for Multimodal Alignment Structured Exploration and Exploitation of Label Functions for Automated Data Annotation MolPaQ: Modular Quantum-Classical Patch Learning for Interpretable Molecular Generation QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation Generating High Quality Synthetic Data for Dutch Medical Conversations Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models Reinforcement-aware Knowledge Distillation for LLM Reasoning SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework A Horizon-Aware Decision-Support Framework for Demand Forecasting Model Selection in Resilient Production Planning H-AdminSim: A Multi-Agent Simulator for Realistic Hospital Administrative Workflows with FHIR Integration AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts Reasoning Models Will Sometimes Lie About Their Reasoning Multi-agent Adaptive Mechanism Design Relational Visual Similarity From Navigation to Refinement: Revealing the Two-Stage Nature of Flow-based Diffusion Models through Oracle Velocity On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs STCast: Adaptive Boundary Alignment for Global and Regional Weather Forecasting HCAST: Human-Calibrated Autonomy Software Tasks OmniPrism: Learning Disentangled Visual Concept for Image Generation
FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models
Junkang Liu, Fanhua Shang, Hongying Liu, Yuxuan Tian, Yuanyuan L · 2025-10-31 · via cs.AI updates on arXiv.org

AdamW has become one of the most effective optimizers for training large-scale models. We have also observed its effectiveness in the context of federated learning (FL). However, directly applying AdamW in federated learning settings poses significant challenges: (1) due to data heterogeneity, AdamW often yields high variance in the second-moment estimate $\boldsymbol{v}$; (2) the local overfitting of AdamW may cause client drift; and (3) Reinitializing moment estimates ($\boldsymbol{v}$, $\boldsymbol{m}$) at each round slows down convergence. To address these challenges, we propose the first \underline{Fed}erated \underline{AdamW} algorithm, called \texttt{FedAdamW}, for training and fine-tuning various large models. \texttt{FedAdamW} aligns local updates with the global update using both a \textbf{local correction mechanism} and decoupled weight decay to mitigate local overfitting. \texttt{FedAdamW} efficiently aggregates the \texttt{mean} of the second-moment estimates to reduce their variance and reinitialize them. Theoretically, we prove that \texttt{FedAdamW} achieves a linear speedup convergence rate of $\mathcal{O}(\sqrt{(L Δσ_l^2)/(S K R ε^2)}+(L Δ)/R)$ without \textbf{heterogeneity assumption}, where $S$ is the number of participating clients per round, $K$ is the number of local iterations, and $R$ is the total number of communication rounds. We also employ PAC-Bayesian generalization analysis to explain the effectiveness of decoupled weight decay in local training. Empirically, we validate the effectiveness of \texttt{FedAdamW} on language and vision Transformer models. Compared to several baselines, \texttt{FedAdamW} significantly reduces communication rounds and improves test accuracy. The code is available in https://github.com/junkangLiu0/FedAdamW.