惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Spread Privacy
Spread Privacy
P
Palo Alto Networks Blog
NISL@THU
NISL@THU
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Cisco Blogs
Project Zero
Project Zero
AWS News Blog
AWS News Blog
S
Securelist
Simon Willison's Weblog
Simon Willison's Weblog
P
Proofpoint News Feed
The Hacker News
The Hacker News
V
Vulnerabilities – Threatpost
S
Schneier on Security
L
LINUX DO - 热门话题
MongoDB | Blog
MongoDB | Blog
Cisco Talos Blog
Cisco Talos Blog
阮一峰的网络日志
阮一峰的网络日志
WordPress大学
WordPress大学
博客园_首页
小众软件
小众软件
博客园 - 叶小钗
The Cloudflare Blog
Know Your Adversary
Know Your Adversary
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
P
Privacy International News Feed
T
Threat Research - Cisco Blogs
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Scott Helme
Scott Helme
Security Latest
Security Latest
Blog — PlanetScale
Blog — PlanetScale
F
Fortinet All Blogs
T
Threatpost
Recorded Future
Recorded Future
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
T
Tor Project blog
T
Tenable Blog
有赞技术团队
有赞技术团队
I
Intezer
D
Darknet – Hacking Tools, Hacker News & Cyber Security
G
GRAHAM CLULEY
Cyberwarzone
Cyberwarzone
U
Unit 42
美团技术团队
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
P
Privacy & Cybersecurity Law Blog
B
Blog
T
The Exploit Database - CXSecurity.com

cs.CL updates on arXiv.org

DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines Learning Adaptive Reasoning Paths for Efficient Visual Reasoning AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation Neuro-Oracle: A Trajectory-Aware Agentic RAG Framework for Interpretable Epilepsy Surgical Prognosis The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models Rethinking Patient Education as Multi-turn Multi-modal Interaction Knowing When Not to Answer: Evaluating Abstention in Multimodal Reasoning Systems Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis One RL to See Them All: Visual Triple Unified Reinforcement Learning VisRet: Visualization Improves Knowledge-Intensive Text-to-Image Retrieval HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Counting Without Numbers and Finding Without Words POP: Prefill-Only Pruning for Efficient Large Model Inference ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning What Is the Minimum Architecture for Prolepsis? Early Irrevocable Commitment Across Tasks in Small Transformers AdaSplash-2: Faster Differentiable Sparse Attention Can Large Language Models Detect Methodological Flaws? Evidence from Gesture Recognition for UAV-Based Rescue Operation Based on Deep Learning Decoupling Scores and Text: The Politeness Principle in Peer Review Correcting Suppressed Log-Probabilities in Language Models with Post-Transformer Adapters Indexing Multimodal Language Models for Large-scale Image Retrieval UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling PersonaVLM: Long-Term Personalized Multimodal LLMs MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking Reward Design for Physical Reasoning in Vision-Language Models When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning? Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning (How) Learning Rates Regulate Catastrophic Overtraining Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning $π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data Collection The Consciousness Cluster: Emergent preferences of Models that Claim to be Conscious KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage Lossless Prompt Compression via Dictionary-Encoding and In-Context Learning: Enabling Cost-Effective LLM Analysis of Repetitive Data Detection Without Correction: A Robust Asymmetry in Activation-Based Hallucination Probing LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size Functional Emotions or Situational Contexts? A Discriminating Test from the Mythos Preview System Card C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain Text-as-Signal: Quantitative Semantic Scoring with Embeddings, Logprobs, and Noise Reduction A Multi-Model Approach to English-Bangla Sentiment Classification of Government Mobile Banking App Reviews A Proactive EMR Assistant for Doctor-Patient Dialogue: Streaming ASR, Belief Stabilization, and Preliminary Controlled Evaluation Token Statistics Reveal Conversational Drift in Multi-turn LLM Interaction Mathematical Reasoning Enhanced LLM for Formula Derivation: A Case Study on Fiber NLI Modellin Red Skills or Blue Skills? A Dive Into Skills Published on ClawHub Correct Chains, Wrong Answers: Dissociating Reasoning from Output in LLM Logic Curation of a Palaeohispanic Dataset for Machine Learning EVE: A Domain-Specific LLM Framework for Earth Intelligence OmniTrace: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs DeEscalWild: A Real-World Benchmark for Automated De-Escalation Training with SLMs Alignment midtraining for animals Can Large Language Models Reliably Extract Physiology Index Values from Coronary Angiography Reports? IWLV-Ramayana: A Sarga-Aligned Parallel Corpus of Valmiki's Ramayana Across Indian Languages Unleashing Implicit Rewards: Prefix-Value Learning for Distribution-Level Optimization InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis Evaluating the Evaluator: Problems with SemEval-2020 Task 1 for Lexical Semantic Change Detection Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs L2D-Clinical: Learning to Defer for Adaptive Model Selection in Clinical Text Classification English is Not All You Need: Systematically Exploring the Role of Multilinguality in LLM Post-Training Giving Voice to the Constitution: Low-Resource Text-to-Speech for Quechua and Spanish Using a Bilingual Legal Corpus AgentSPEX: An Agent SPecification and EXecution Language Peer-Predictive Self-Training for Language Model Reasoning TLoRA+: A Low-Rank Parameter-Efficient Fine-Tuning Method for Large Language Models Empirical Evidence of Complexity-Induced Limits in Large Language Models on Finite Discrete State-Space Problems with Explicit Validity Constraints From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding Using reasoning LLMs to extract SDOH events from clinical notes ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding Synthesizing Instruction-Tuning Datasets with Contrastive Decoding Debate to Align: Reliable Entity Alignment through Two-Stage Multi-Agent Debate Training-Free Test-Time Contrastive Learning for Large Language Models YOCO++: Enhancing YOCO with KV Residual Connections for Efficient LLM Inference MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning BenGER Platform: A Collaborative Web Platform for End-to-End Benchmarking of German Legal Tasks Foresight Optimization for Strategic Reasoning in Large Language Models Syn-TurnTurk: A Synthetic Dataset for Turn-Taking Prediction in Turkish Dialogues IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages Breaking the Generator Barrier: Disentangled Representation for Generalizable AI-Text Detection Beyond Arrow's Impossibility: Fairness as an Emergent Property of Multi-Agent Collaboration Co-FactChecker: A Framework for Human-AI Collaborative Claim Verification Using Large Reasoning Models Learning the Cue or Learning the Word? Analyzing Generalization in Metaphor Detection for Verbs An Empirical Investigation of Practical LLM-as-a-Judge Improvement Techniques on RewardBench 2 Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA
Token-Level LLM Collaboration via FusionRoute
Nuoya Xiong, Yuhang Zhou, Hanqing Zeng, Zhaorun Chen, Furong Hua · 2026-01-09 · via cs.CL updates on arXiv.org

Large language models (LLMs) exhibit strengths across diverse domains. However, achieving strong performance across these domains with a single general-purpose model typically requires scaling to sizes that are prohibitively expensive to train and deploy. On the other hand, while smaller domain-specialized models are much more efficient, they struggle to generalize beyond their training distributions. To address this dilemma, we propose FusionRoute, a robust and effective token-level multi-LLM collaboration framework in which a lightweight router simultaneously (i) selects the most suitable expert at each decoding step and (ii) contributes a complementary logit that refines or corrects the selected expert's next-token distribution via logit addition. Unlike existing token-level collaboration methods that rely solely on fixed expert outputs, we provide a theoretical analysis showing that pure expert-only routing is fundamentally limited: unless strong global coverage assumptions hold, it cannot in general realize the optimal decoding policy. By augmenting expert selection with a trainable complementary generator, FusionRoute expands the effective policy class and enables recovery of optimal value functions under mild conditions. Empirically, across both Llama-3 and Gemma-2 families and diverse benchmarks spanning mathematical reasoning, code generation, and instruction following, FusionRoute outperforms both sequence- and token-level collaboration, model merging, and direct fine-tuning, while remaining competitive with domain experts on their respective tasks.