惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

NISL@THU
NISL@THU
Vercel News
Vercel News
aimingoo的专栏
aimingoo的专栏
P
Proofpoint News Feed
Stack Overflow Blog
Stack Overflow Blog
T
Tailwind CSS Blog
云风的 BLOG
云风的 BLOG
L
LangChain Blog
有赞技术团队
有赞技术团队
Last Week in AI
Last Week in AI
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
WordPress大学
WordPress大学
博客园 - 司徒正美
宝玉的分享
宝玉的分享
F
Full Disclosure
Microsoft Security Blog
Microsoft Security Blog
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
B
Blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Y
Y Combinator Blog
I
InfoQ
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
博客园 - 聂微东
博客园 - Franky
MyScale Blog
MyScale Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
T
The Blog of Author Tim Ferriss
月光博客
月光博客
H
Help Net Security
B
Blog RSS Feed
人人都是产品经理
人人都是产品经理
V
V2EX
罗磊的独立博客
小众软件
小众软件
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
大猫的无限游戏
大猫的无限游戏
N
Netflix TechBlog - Medium
A
About on SuperTechFans
Apple Machine Learning Research
Apple Machine Learning Research
Hugging Face - Blog
Hugging Face - Blog
S
SegmentFault 最新的问题
D
Docker
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
The Cloudflare Blog
量子位
Jina AI
Jina AI
博客园_首页

cs.CL updates on arXiv.org

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs Rhetorical Questions in LLM Representations: A Linear Probing Study Correct Prediction, Wrong Steps? Consensus Reasoning Knowledge Graph for Robust Chain-of-Thought Synthesis TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding Interpretable Stylistic Variation in Human and LLM Writing Across Genres, Models, and Decoding Strategies From Weights to Activations: Is Steering the Next Frontier of Adaptation? $π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data From Where Words Come: Efficient Regularization of Code Tokenizers Through Source Attribution Dual-Enhancement Product Bundling: Bridging Interactive Graph and Large Language Model Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents Diffusion Language Models for Speech Recognition Reward Design for Physical Reasoning in Vision-Language Models Adaptive Conformal Prediction for Improving Factuality of Generations by Large Language Models Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data Causal Drawbridges: Characterizing Gradient Blocking of Syntactic Islands in Transformer LMs CollabCoder: Plan-Code Co-Evolution via Collaborative Decision-Making for Efficient Code Generation Do We Still Need Humans in the Loop? Comparing Human and LLM Annotation in Active Learning for Hostility Detection Beyond Static Personas: Situational Personality Steering for Large Language Models Robust Reward Modeling for Large Language Models via Causal Decomposition MUSE: Multi-Domain Chinese User Simulation via Self-Evolving Profiles and Rubric-Guided Alignment ToolOmni: Enabling Open-World Tool Use via Agentic learning with Proactive Retrieval and Grounded Execution QuantileMark: A Message-Symmetric Multi-bit Watermark for LLMs From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA Hybrid Retrieval for COVID-19 Literature: Comparing Rank Fusion and Projection Fusion with Diversity Reranking On Cost-Effective LLM-as-a-Judge Improvement Techniques Learning the Cue or Learning the Word? Analyzing Generalization in Metaphor Detection for Verbs Co-FactChecker: A Framework for Human-AI Collaborative Claim Verification Using Large Reasoning Models Beyond Arrow's Impossibility: Fairness as an Emergent Property of Multi-Agent Collaboration Breaking the Generator Barrier: Disentangled Representation for Generalizable AI-Text Detection IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference (How) Learning Rates Regulate Catastrophic Overtraining Syn-TurnTurk: A Synthetic Dataset for Turn-Taking Prediction in Turkish Dialogues C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences Foresight Optimization for Strategic Reasoning in Large Language Models BenGER Platform: A Collaborative Web Platform for End-to-End Benchmarking of German Legal Tasks MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning YOCO++: Enhancing YOCO with KV Residual Connections for Efficient LLM Inference Training-Free Test-Time Contrastive Learning for Large Language Models Debate to Align: Reliable Entity Alignment through Two-Stage Multi-Agent Debate Synthesizing Instruction-Tuning Datasets with Contrastive Decoding ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning Using reasoning LLMs to extract SDOH events from clinical notes From Relevance to Authority: Authority-aware Generative Retrieval in Web Search Engines CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning Empirical Evidence of Complexity-Induced Limits in Large Language Models on Finite Discrete State-Space Problems with Explicit Validity Constraints TLoRA+: A Low-Rank Parameter-Efficient Fine-Tuning Method for Large Language Models Peer-Predictive Self-Training for Language Model Reasoning AgentSPEX: An Agent SPecification and EXecution Language WebXSkill: Skill Learning for Autonomous Web Agents Giving Voice to the Constitution: Low-Resource Text-to-Speech for Quechua and Spanish Using a Bilingual Legal Corpus English is Not All You Need: Systematically Exploring the Role of Multilinguality in LLM Post-Training L2D-Clinical: Learning to Defer for Adaptive Model Selection in Clinical Text Classification Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size Indexing Multimodal Language Models for Large-scale Image Retrieval Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs Evaluating the Evaluator: Problems with SemEval-2020 Task 1 for Lexical Semantic Change Detection InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis Unleashing Implicit Rewards: Prefix-Value Learning for Distribution-Level Optimization Growing Pains: Extensible and Efficient LLM Benchmarking Via Fixed Parameter Calibration Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness Detecting Safety Violations Across Many Agent Traces C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling Discourse Diversity in Multi-Turn Empathic Dialogue Evaluating Cooperation in LLM Social Groups through Elected Leadership SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems Legal2LogicICL: Improving Generalization in Transforming Legal Cases to Logical Formulas via Diverse Few-Shot Learning Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents A Triadic Suffix Tokenization Scheme for Numerical Reasoning Hidden Measurement Error in LLM Pipelines Distorts Annotation, Evaluation, and Benchmarking Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models Quantization Dominates Rank Reduction for KV-Cache Compression Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference Anthropogenic Regional Adaptation in Multimodal Vision-Language Model Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories Do LLMs Know Tool Irrelevance? Demystifying Structural Alignment Bias in Tool Invocations
When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions
Wei Xia, Haoqing Wang, Zhi-Hong Deng, Yehui Tang · 2026-05-20 · via cs.CL updates on arXiv.org

Chain-of-thought (CoT) reasoning has become the default strategy for enhancing LLM capabilities, yet its application raises a fundamental question: when is explicit reasoning actually beneficial? Empirical evidence reveals a striking paradox: CoT often provides marginal or even negative gains on factual and open-ended tasks while multiplying token consumption. In this work, we show that LLM reasoning is not a static property of tasks or models, but a \emph{dynamic decoding state} that emerges during generation. Through systematic analysis, we find early-stage entropy dynamics provide a reliable signal of this state: tasks benefiting from CoT exhibit consistent entropy reduction, while others display unstable or increasing patterns. This behavior can be interpreted as a phase-transition-like shift from a high-entropy exploratory regime to a low-entropy structured reasoning regime. Based on these insights, we propose \textbf{EDRM} (Entropy Dynamics-based Reasoning Manifold), a lightweight and training-free routing framework that leverages early decoding entropy to adaptively select inference strategies. EDRM embeds entropy trajectories into a compact and interpretable manifold representation, enabling both zero-shot deployment and fine-grained instance-level adaptation. Across 15 benchmarks and 4 LLMs of varying scales and architectures, EDRM consistently outperforms static baselines. At the dataset level, EDRM achieves \textbf{41--55\%} token reduction while improving accuracy with as few as 50 calibration samples. At the instance level, it further improves accuracy by up to \textbf{4.7\%} while maintaining \textbf{27--45\%} token savings. These results suggest that reasoning should be invoked selectively rather than by default, and demonstrate the effectiveness of entropy-driven decoding control for efficient and adaptive LLM inference.