惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

cs.CL updates on arXiv.org

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning MedicalBench: Evaluating Large Language Models Toward Improved Medical Concept Extraction Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs Cross-lingual robustness of LLM-brain alignment and its computational roots ACL-Verbatim: hallucination-free question answering for research Smarter edits? Post-editing with error highlights and translation suggestions Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization Refining and Reusing Annotation Guidelines for LLM Annotation NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval AgentAtlas: Beyond Outcome Leaderboards for LLM Agents Beyond Semantic Similarity: A Two-Phase Non-Parametric Retrieval Workflow for Corporate Credit Underwriting AFD-INSTRUCTION: A Comprehensive Antibody Instruction Dataset with Functional Annotations for LLM-Based Understanding and Design Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs Text Analytics Evaluation Framework: A Case Study on LLMs and Social Media LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws Terminal-World: Scaling Terminal-Agent Environments via Agent Skills DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU Bayesian Preference Learning for Test-Time Steerable Reward Models Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation Direct Translation between Sign Languages Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues LamPO: A Lambda Style Policy Optimization for Reasoning Language Models Reinforcing Human Behavior Simulation via Verbal Feedback Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents WCXB: A Multi-Type Web Content Extraction Benchmark GradeLegal: Automated Grading for German Legal Cases AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists APM: Evaluating Style Personalization in LLMs with Arbitrary Preference Mappings Assessing socio-economic climate impacts from text data MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks Strategy-Induct: Task-Level Strategy Induction for Instruction Generation Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals Single-Pass, Depth-Selective Reading for Multi-Aspect Sentiment Analysis Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models DEL: Digit Entropy Loss for Numerical Learning of Large Language Models Enhancing Scientific Discourse: Machine Translation for the Scientific Domain LASH: Adaptive Semantic Hybridization for Black-Box Jailbreaking of Large Language Models Puzzled By ChatGPT? No more! A Jigsaw Puzzle to Promote AI Literacy and Awareness Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control Fine-grained Claim-level RAG Benchmark for Law DIVE: Embedding Compression via Self-Limiting Gradient Updates HRM-Text: Efficient Pretraining Beyond Scaling SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models Tracing the ongoing emergence of human-like reasoning in Large Language Models When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks SCRIBE: Diagnostic Evaluation and Rich Transcription Models for Indic ASR Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression Findings of the Counter Turing Test: AI-Generated Text Detection What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework Metaphors in Literary Post-Editing: Opening Pandora's Box? Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs Reliable Automated Triage in Spanish Clinical Notes: A Hybrid Framework for Risk-Aware HIV Suspicion Identification Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task "I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration When Irregularity Helps: A Subclass Analysis of Inductive Bias in Neural Morphology Do LLMs Know What Luxembourgish Borrows? Probing Lexical Neology in Low-Resource Multilingual Models Towards Context-Invariant Safety Alignment for Large Language Models Interpretable Discriminative Text Representations via Agreement and Label Disentanglement Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning Beyond Text-to-SQL: An Agentic LLM System for Governed Enterprise Analytics APIs Building Arabic NLP from the Ground Up: Twenty Years of Lessons, Failures, and Open Problems Task-Routed Mixture-of-Experts with Cognitive Appraisal for Implicit Sentiment Analysis Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor PulseCol: Periodically Refreshed Column-Sparse Attention for Accelerating Diffusion Language Models DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Data Scaling as Progressive Coverage of a Predictive Contribution Spectrum Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning Training Language Agents to Learn from Experience On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists Multi-agent Collaboration with State Management Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study Gated Normalization Removal and Scale Anchoring in Pre-Norm Transformers Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy Building a Custom Taxonomy of AI Skills and Tasks from the Ground Up with Job Postings Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding Stage-Audit: Auditable Source-Frontier Discovery for Cross-Wiki Tables Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding ArPoMeme: An Annotated Arabic Multimodal Dataset for Political Ideology and Polarization JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media MemGym: a Long-Horizon Memory Environment for LLM Agents
The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models
Akshay Paruc · 2026-04-17 · via cs.CL updates on arXiv.org

View PDF

Abstract:Multimodal language models systematically underperform on visual perception tasks, yet the structure underlying this failure remains poorly understood. We propose centroid replacement, collapsing each token to its nearest K-means centroid, as a controlled probe for modal dependence. Across seven models spanning three architecture families, erasing text centroid structure costs 4$\times$ more accuracy than erasing visual centroid structure, exposing a universal imbalance where language representations overshadow vision even on tasks that demand visual reasoning. We exploit this asymmetry through text centroid contrastive decoding, recovering up to +16.9% accuracy on individual tasks by contrastively decoding against a text-centroid-erased reference. This intervention varies meaningfully with training approaches: standard fine-tuned models show larger gains (+5.6% on average) than preference-optimized models (+1.5% on average). Our findings suggest that modal competition is structurally localized, correctable at inference time without retraining, and quantifiable as a diagnostic signal to guide future multimodal training.
Comments: 29 pages, 9 figures, 19 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2604.14363 [cs.CL]
  (or arXiv:2604.14363v1 [cs.CL] for this version)
  https://doi.org/10.48550/arXiv.2604.14363

arXiv-issued DOI via DataCite

Submission history

From: Akshay Paruchuri [view email]
[v1] Wed, 15 Apr 2026 19:26:30 UTC (3,154 KB)