惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.CL updates on arXiv.org

Stop Listening to Me! How Multi-turn Conversations Can Degrade LLM Reliability SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution The Need for an External Observer Formalizing the Sufficiency Gap: A Mathematical Extension of Mixture Identifiability and Contextual Grounding in Sequence Models MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding The Labyrinth and the Thread: Rethinking Regularizations in Sequential Knowledge Editing for Large Language Models LATTE: Forecasting Peer Anchored Preference Trajectories for Personalized LLM Generation ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence Representation-Aware Unlearning via Activation Signatures: From Suppression to Entity-Signature Erasure Conceptual Steganography Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent Slide Deck Q&A Quality Assurance App: A Multi-Stage Pipeline for Pedagogical Question Generation Learning GUI Grounding with Spatial Reasoning from Visual Feedback Hubness, Not Anisotropy, Drives Cross-Lingual Retrieval Asymmetry in Multilingual Embedding Models Elias in the Lighthouse, Again? Diagnosing Low Diversity in LLM Stories LaRe: Latent Refocusing for Multimodal Reasoning Evidence Absence Is Not Evidence Insufficiency: Diagnosing NEI Construction Artifacts in Fact Verification QAM-W: Joint 2D Codebook Quantization for LLM Weights via Hadamard Rotation and Activation-Aware Scaling Alignment Tuning for Large Language Models: A Data-Centric Lens on Alignment Data Pipelines SPEAR: Code-Augmented Agentic Prompt Optimization Beyond Transfer Accuracy: Faithful Circuits for Controlled Low-Resource Adaptation Beyond Binary: Speech Representations Across the Cognitive Score Hierarchy Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention Tracing Computation Density in LLMs UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations OmniInteract: Benchmarking Real-World Streaming Interaction for Real-Time Omnimodal Assistants From Snippets to Semantics: Rethinking Evidence Granularity for Multilingual Fact Verification Rethinking the Multilingual Reasoning Gap with Layer Swap Pop-Up Distractions Reveal Bag-of-Events Behavior in Video Large Language Models Vectors Are Not Neutral: Sensitive-Information Inference from Exported LLM Representations in Summarization Model Unlearning Objectives Vary for Distinct Language Functions Memory Architectures for Multi-Turn Text-to-SQL: A Benchmark and Empirical Study SetupX: Can LLM Agents Learn from Past Failures in Functionality-Correct Code Repository Setup? Reliable Extraction of Clinical Follow-Up Instructions: A Hybrid Neural-Symbolic Pipeline MobileMoE: Scaling On-Device Mixture of Experts MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems LEC: Linear Expectation Constraints for Selection-Conditioned Risk Control in Selective Prediction and Routing Systems FalAR: A Large-scale Speaker-Annotated European Portuguese Speech Corpus of Parliamentary Sessions The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers Strategic Persuasion with Trait-Conditioned Multi-Agent Systems for Iterative Legal Argumentation Curation and Extraction of Drug-Related Entities from Reddit Platform GraphDancer: Training LLMs to Explore and Reason over Graphs via Two-Stage Curriculum Post-Training Tool-Schema Compression Enables Agentic RAG Under Constrained Context Budgets NestedKV: Nested Memory Routing for Long-Context KV Cache Compression SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning Learning When to Think While Listening in Large Audio-Language Models PinPoint: Prompting with Informative Interior Points Rethinking the Trust Region in LLM Reinforcement Learning HiSpec: Hierarchical Speculative Decoding for LLMs MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions LiPUP-MA: A Residential Experience-centric Multi-Agent Framework for Living-in-the-loop Participatory Urban Planning The Daily Dose: Workflow-Integrated Large Language Model Automation for Clinical Summarization and Trial Identification in Radiation Oncology An In-Vitro Study on Cross-Lingual Generalization in Language Models A Universal Cliff and a Design Fingerprint: Cross-Section Defect Detection Under LLM Orchestration MicroSpec: Accelerating Speculative Decoding with Lightweight In-Context Vocabularies Towards Just-in-Time Adaptive Feedback: Enhancing Student Learning via Knowledge-Grounded LLM FAB-Bench: A Framework for Adaptive RAG Benchmarking in Semiconductor Manufacturing RICE-PO: Turning Retrieval Interactions into Credit Signals for Reasoning Agents In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective Annotator Positionality as Signal: Psychometric Weighting for Anti-Autistic Ableism Detection LURE: Live-Usage Replay Evaluations for Reducing Evaluation Awareness Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of Reasoning over Linearized Representations AI evaluation may bias perceptions: The importance of context in interpreting academic writing Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline Granuscore: A Reference-Free Measure of Granularity for Text Analysis and Question Answering Not All Tokens Matter Equally: Dynamic In-context Vector Distillation with Decisive-Token Supervision for Long-form Medical Report Generation Towards Error-Free EHRs: Reasoning-Intensive Consistency Verification Between Clinical Notes and Structured Tables in Electronic Health Records Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models Verilog-Evolve: Feedback-Driven and Skill-Evolving Verilog Generation Why Prompt Optimization Works, and Why It Sometimes Doesn't: A Causal-Inspired Edit-Level Analysis Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications LLMs Are Already Good Tutors: Training-Free Prompt Optimization for Pedagogical Math Tutoring BhashaSetu: A Data-Centric Approach to Low-Resource Machine Translation Advancing Creative Physical Intelligence in Large Multimodal Models Probing the Knowledge Boundary: An Interactive Agentic Framework for Deep Knowledge Extraction Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination It's Not Always Sycophancy: Measuring LLM Conformity as a Function of Epistemic Uncertainty Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior KARMA: Karma-Aligned Reward Model Adaptation Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders It's Not the Capability: Harness Sensitivity Is Non-Monotone Across LLM Agent Tiers A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection Omanic: Towards Step-wise Evaluation of Multi-hop Reasoning in Large Language Models Bounded Path Context: A Controlled Study of Visible Path History in LLM-Based Knowledge Graph Question Answering Cultural Value Alignment Via Latent Activation Steering in Large Language Models Conv-to-Bench: Evaluating Language Models Via User-Assistant Dialogues In Code Tasks Targeted Remasking: Replacing Token Editing with Token-to-Mask Refinement in Discrete Diffusion Language Models
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation
Seonghoon Yu · 2026-05-13 · via cs.CL updates on arXiv.org

View PDF HTML (experimental)

Abstract:Recent think-answer approaches in VLMs, such as Qwen3-VL-Thinking, boost reasoning performance by leveraging intermediate thinking steps before the final answer, but their computational cost becomes substantial, especially for larger VLMs. To distill such capabilities into compact think-answer VLMs, a primary objective is to improve the student's ability to utilize visual evidence throughout its reasoning trace, as long think-answer traces suffer from visual forgetting issues. To this end, we introduce a novel think-answer distillation framework that encourages the student to anchor its thinking on visual information by masking the student's salient reasoning prefixes. To compensate for such masked textual cues, the student is encouraged to rely more on visual evidence as an alternative source of information during distillation. Our masking strategies include: 1) token-wise salient reasoning-prefix masking, which masks high-influence reasoning prefixes selectively for each next-token prediction, and 2) self-paced masking budget scheduling, which gradually increases the masking scale according to distillation difficulty, measured by the discrepancy between teacher--student distributions. In the distillation phase, the student is guided by our salient reasoning-prefix mask, which blocks both future tokens and salient reasoning cues, in place of the standard causal mask used for auto-regressive language modeling. Experimental results show that our approach outperforms recent open-source VLMs, VLM distillation, and self-distillation methods on multimodal reasoning benchmarks, while further analyzes confirm enhanced visual utilization along the student thinking process.
Comments: Pre-print
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as: arXiv:2605.11651 [cs.CV]
  (or arXiv:2605.11651v4 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2605.11651

arXiv-issued DOI via DataCite

Submission history

From: Seonghoon Yu [view email]
[v1] Tue, 12 May 2026 07:14:04 UTC (6,058 KB)
[v2] Wed, 13 May 2026 01:49:55 UTC (6,058 KB)
[v3] Fri, 15 May 2026 06:49:33 UTC (6,058 KB)
[v4] Tue, 26 May 2026 04:36:02 UTC (6,058 KB)