惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Cyberwarzone
Cyberwarzone
T
The Blog of Author Tim Ferriss
人人都是产品经理
人人都是产品经理
博客园 - 叶小钗
博客园_首页
量子位
B
Blog RSS Feed
H
Help Net Security
aimingoo的专栏
aimingoo的专栏
F
Fortinet All Blogs
D
DataBreaches.Net
云风的 BLOG
云风的 BLOG
罗磊的独立博客
K
Kaspersky official blog
S
Securelist
C
Cyber Attacks, Cyber Crime and Cyber Security
P
Palo Alto Networks Blog
I
Intezer
Know Your Adversary
Know Your Adversary
S
Security Affairs
B
Blog
Engineering at Meta
Engineering at Meta
Recent Commits to openclaw:main
Recent Commits to openclaw:main
G
GRAHAM CLULEY
T
The Exploit Database - CXSecurity.com
L
LINUX DO - 热门话题
T
Threat Research - Cisco Blogs
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
P
Privacy International News Feed
Cisco Talos Blog
Cisco Talos Blog
T
Tor Project blog
Scott Helme
Scott Helme
Simon Willison's Weblog
Simon Willison's Weblog
Help Net Security
Help Net Security
A
Arctic Wolf
NISL@THU
NISL@THU
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
IT之家
IT之家
爱范儿
爱范儿
有赞技术团队
有赞技术团队
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
V
Vulnerabilities – Threatpost
The Hacker News
The Hacker News
博客园 - 聂微东
I
InfoQ
Schneier on Security
Schneier on Security
Recent Announcements
Recent Announcements
GbyAI
GbyAI
D
Darknet – Hacking Tools, Hacker News & Cyber Security
小众软件
小众软件

cs.CL updates on arXiv.org

DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines Learning Adaptive Reasoning Paths for Efficient Visual Reasoning AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation Neuro-Oracle: A Trajectory-Aware Agentic RAG Framework for Interpretable Epilepsy Surgical Prognosis The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models Rethinking Patient Education as Multi-turn Multi-modal Interaction Knowing When Not to Answer: Evaluating Abstention in Multimodal Reasoning Systems Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis One RL to See Them All: Visual Triple Unified Reinforcement Learning VisRet: Visualization Improves Knowledge-Intensive Text-to-Image Retrieval HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Counting Without Numbers and Finding Without Words KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality POP: Prefill-Only Pruning for Efficient Large Model Inference ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning What Is the Minimum Architecture for Prolepsis? Early Irrevocable Commitment Across Tasks in Small Transformers AdaSplash-2: Faster Differentiable Sparse Attention Can Large Language Models Detect Methodological Flaws? Evidence from Gesture Recognition for UAV-Based Rescue Operation Based on Deep Learning Decoupling Scores and Text: The Politeness Principle in Peer Review Correcting Suppressed Log-Probabilities in Language Models with Post-Transformer Adapters Indexing Multimodal Language Models for Large-scale Image Retrieval UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling PersonaVLM: Long-Term Personalized Multimodal LLMs MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking Reward Design for Physical Reasoning in Vision-Language Models When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning? Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning (How) Learning Rates Regulate Catastrophic Overtraining Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning $\pi$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data Collection The Consciousness Cluster: Emergent preferences of Models that Claim to be Conscious KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage Lossless Prompt Compression via Dictionary-Encoding and In-Context Learning: Enabling Cost-Effective LLM Analysis of Repetitive Data Detection Without Correction: A Robust Asymmetry in Activation-Based Hallucination Probing LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size Functional Emotions or Situational Contexts? A Discriminating Test from the Mythos Preview System Card C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain Text-as-Signal: Quantitative Semantic Scoring with Embeddings, Logprobs, and Noise Reduction A Multi-Model Approach to English-Bangla Sentiment Classification of Government Mobile Banking App Reviews A Proactive EMR Assistant for Doctor-Patient Dialogue: Streaming ASR, Belief Stabilization, and Preliminary Controlled Evaluation Token Statistics Reveal Conversational Drift in Multi-turn LLM Interaction Mathematical Reasoning Enhanced LLM for Formula Derivation: A Case Study on Fiber NLI Modellin Red Skills or Blue Skills? A Dive Into Skills Published on ClawHub Correct Chains, Wrong Answers: Dissociating Reasoning from Output in LLM Logic Curation of a Palaeohispanic Dataset for Machine Learning EVE: A Domain-Specific LLM Framework for Earth Intelligence OmniTrace: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs DeEscalWild: A Real-World Benchmark for Automated De-Escalation Training with SLMs Document-tuning for robust alignment to animals Can Large Language Models Reliably Extract Physiology Index Values from Coronary Angiography Reports? IWLV-Ramayana: A Sarga-Aligned Parallel Corpus of Valmiki's Ramayana Across Indian Languages Unleashing Implicit Rewards: Prefix-Value Learning for Distribution-Level Optimization InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis Evaluating the Evaluator: Problems with SemEval-2020 Task 1 for Lexical Semantic Change Detection Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs L2D-Clinical: Learning to Defer for Adaptive Model Selection in Clinical Text Classification English is Not All You Need: Systematically Exploring the Role of Multilinguality in LLM Post-Training Giving Voice to the Constitution: Low-Resource Text-to-Speech for Quechua and Spanish Using a Bilingual Legal Corpus AgentSPEX: An Agent SPecification and EXecution Language Peer-Predictive Self-Training for Language Model Reasoning TLoRA+: A Low-Rank Parameter-Efficient Fine-Tuning Method for Large Language Models Empirical Evidence of Complexity-Induced Limits in Large Language Models on Finite Discrete State-Space Problems with Explicit Validity Constraints From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding Using reasoning LLMs to extract SDOH events from clinical notes ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding Synthesizing Instruction-Tuning Datasets with Contrastive Decoding Debate to Align: Reliable Entity Alignment through Two-Stage Multi-Agent Debate Training-Free Test-Time Contrastive Learning for Large Language Models YOCO++: Enhancing YOCO with KV Residual Connections for Efficient LLM Inference MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning BenGER Platform: A Collaborative Web Platform for End-to-End Benchmarking of German Legal Tasks Foresight Optimization for Strategic Reasoning in Large Language Models Syn-TurnTurk: A Synthetic Dataset for Turn-Taking Prediction in Turkish Dialogues IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages Breaking the Generator Barrier: Disentangled Representation for Generalizable AI-Text Detection Beyond Arrow's Impossibility: Fairness as an Emergent Property of Multi-Agent Collaboration Co-FactChecker: A Framework for Human-AI Collaborative Claim Verification Using Large Reasoning Models Learning the Cue or Learning the Word? Analyzing Generalization in Metaphor Detection for Verbs
How reliable are LLMs when it comes to playing dice?
[Submitted on 5 Jun 2026] · 2026-06-08 · via cs.CL updates on arXiv.org
arXiv:2606.07515v2 Announce Type: replace-cross Abstract: We investigate the probabilistic reasoning capabili…