惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
Stack Overflow Blog
Stack Overflow Blog
MongoDB | Blog
MongoDB | Blog
小众软件
小众软件
U
Unit 42
S
SegmentFault 最新的问题
A
About on SuperTechFans
T
Tailwind CSS Blog
Hugging Face - Blog
Hugging Face - Blog
H
Help Net Security
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recorded Future
Recorded Future
V
Visual Studio Blog
G
Google Developers Blog
The GitHub Blog
The GitHub Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
I
InfoQ
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Y
Y Combinator Blog
博客园 - 司徒正美
量子位
美团技术团队
云风的 BLOG
云风的 BLOG
B
Blog RSS Feed
酷 壳 – CoolShell
酷 壳 – CoolShell
D
Docker
J
Java Code Geeks
B
Blog
L
LangChain Blog
博客园 - 叶小钗
雷峰网
雷峰网
博客园_首页
F
Fortinet All Blogs
Recent Announcements
Recent Announcements
Google DeepMind News
Google DeepMind News
The Cloudflare Blog
Engineering at Meta
Engineering at Meta
有赞技术团队
有赞技术团队
H
Hackread – Cybersecurity News, Data Breaches, AI and More
GbyAI
GbyAI
Blog — PlanetScale
Blog — PlanetScale
Microsoft Azure Blog
Microsoft Azure Blog
阮一峰的网络日志
阮一峰的网络日志
P
Proofpoint News Feed
博客园 - 聂微东
腾讯CDC
T
The Blog of Author Tim Ferriss
罗磊的独立博客
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
博客园 - 三生石上(FineUI控件)

cs.CL updates on arXiv.org

Indexing Multimodal Language Models for Large-scale Image Retrieval SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding Reward Design for Physical Reasoning in Vision-Language Models MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling Human vs. Machine Deception: Distinguishing AI-Generated and Human-Written Fake News Using Ensemble Learning Weird Generalization is Weirdly Brittle Mirroring Minds: Asymmetric Linguistic Accommodation and Diagnostic Identity in ADHD and Autism Reddit Communities Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models Who Wrote This Line? Evaluating the Detection of LLM-Generated Classical Chinese Poetry Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations Simulating Organized Group Behavior: New Framework, Benchmark, and Analysis ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification Nationality encoding in language model hidden states: Probing culturally differentiated representations in persona-conditioned academic text Relational Probing: LM-to-Graph Adaptation for Financial Prediction CodeComp: Structural KV Cache Compression for Agentic Coding FAITH: Factuality Alignment through Integrating Trustworthiness and Honestness Comparative Analysis of Large Language Models in Healthcare Adaptive Multi-Expert Reasoning via Difficulty-Aware Routing and Uncertainty-Guided Aggregation A Structured Clustering Approach for Inducing Media Narratives NameBERT: Scaling Name-Based Nationality Classification with LLM-Augmented Open Academic Data LASQ: A Low-resource Aspect-based Sentiment Quadruple Extraction Dataset BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection Turing or Cantor: That is the Question NOSE: Neural Olfactory-Semantic Embedding with Tri-Modal Orthogonal Contrastive Learning Instruction Data Selection via Answer Divergence EviCare: Enhancing Diagnosis Prediction with Deep Model-Guided Evidence for In-Context Reasoning Dynamic Adaptive Attention and Supervised Contrastive Learning: A Novel Hybrid Framework for Text Sentiment Classification Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning Knowing What to Stress: A Discourse-Conditioned Text-to-Speech Benchmark HeceTokenizer: A Syllable-Based Tokenization Approach for Turkish Retrieval BlasBench: An Open Benchmark for Irish Speech Recognition OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation How Robust Are Large Language Models for Clinical Numeracy? An Empirical Study on Numerical Reasoning Abilities in Clinical Contexts Evaluating Memory Capability in Continuous Lifelog Scenario Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation Hidden Measurement Error in LLM Pipelines Distorts Annotation, Evaluation, and Benchmarking LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference CAMO: A Class-Aware Minority-Optimized Ensemble for Robust Language Model Evaluation on Imbalanced Data C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts Evaluating Cooperation in LLM Social Groups through Elected Leadership A Triadic Suffix Tokenization Scheme for Numerical Reasoning Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models Anthropogenic Regional Adaptation in Multimodal Vision-Language Model METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues Towards Proactive Information Probing: Customer Service Chatbots Harvesting Value from Conversation A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities Linear Representations of Hierarchical Concepts in Language Models TInR: Exploring Tool-Internalized Reasoning in Large Language Models Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models LLMs Should Incorporate Explicit Mechanisms for Human Empathy ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning The Amazing Agent Race: Strong Tool Users, Weak Navigators Think in Sentences: Explicit Sentence Boundaries Enhance Language Model's Capabilities CircuitSynth: Reliable Synthetic Data Generation ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models Computational Implementation of a Model of Category-Theoretic Metaphor Comprehension CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks Cross-Cultural Value Awareness in Large Vision-Language Models Should We be Pedantic About Reasoning Errors in Machine Translation? Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards COMPOSITE-Stem GIANTS: Generative Insight Anticipation from Scientific Literature Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering Detecting HIV-Related Stigma in Clinical Narratives Using Large Language Models IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures EduIllustrate: Towards Scalable Automated Generation Of Multimodal Educational Content How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models SODA: Semi On-Policy Black-Box Distillation for Large Language Models VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors How LLMs Might Think Digital hybridity and relics in cultural heritage: using corpus linguistics to inform design in emerging technologies from AI to VR ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding Do Neurons Dream of Primitive Operators? Wake-Sleep Compression Rediscovers Schank's Event Semantics PICon: A Multi-Turn Interrogation Framework for Evaluating Persona Agent Consistency GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents Generating High Quality Synthetic Data for Dutch Medical Conversations Measuring and curing reasoning rigidity: from decorative chain-of-thought to genuine faithfulness PersonaVLM: Long-Term Personalized Multimodal LLMs HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation
Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution
Jacob Dineen, Aswin RRV, Zhikun Xu, Ben Zhou · 2026-04-04 · via cs.CL updates on arXiv.org

Co-evolutionary self-play, where one language model generates problems and another solves them, promises autonomous curriculum learning without human supervision. In practice, the proposer quickly converges to a narrow distribution of problems that satisfy the reward function. This diversity collapse renders the curriculum uninformative for the solver, stalling the co-evolutionary loop. We introduce vocabulary dropout, a random mask applied to the proposer's output logits during both policy training and curriculum generation, as a lightweight mechanism to sustain diversity. The mask is hard and non-stationary, preventing the proposer from locking into fixed token sequences. Training Qwen3-4B and Qwen3-8B on mathematical reasoning via R-Zero, we find that vocabulary dropout sustains proposer diversity across lexical, semantic, and functional metrics throughout training, and yields solver improvements averaging +4.4 points at 8B, with the largest gains on competition-level benchmarks. Our findings suggest that explicit action-space constraints, analogous to the structural role that game rules play in classical self-play, can help sustain productive co-evolution in language. Vocabulary dropout is one simple instantiation of this principle.