惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs.AI updates on arXiv.org

Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform Proper Scoring Rules for Agentic Uncertainty Quantification Fundamental Limitation in Explaining AI CITYREP: A Unified Benchmark for Urban Representations Across Cities, Tasks, and Modalities MDIA: A Multi-Agent Diagnostic Intelligence Pipeline on HealthBench Professional HeartBeatAI: An Interpretable and Robust Deep Learning Framework for Multi-Label ECG Arrhythmia Detection From Accuracy to Auditability: A Survey of Determinism in Financial AI Systems ProActor: Timing-Aware Reinforcement Learning for Proactive Task Scheduling Agents Benchmarking the Limits of In-Context Reinforcement Learning for Ad-Hoc Teamwork Exploration of Perceptual Speech Features for Clinical Decision-Support in Mental Health Care Emission-Aware Reinforcement Learning for Sustainable Electric Vehicle Charging and Carbon Dioxide Reduction Under Varying Renewable Penetration Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework When Does Synthetic Patent Data Help? Volume-Fidelity Trade-offs in Low-Resource Multi-Label Classification Summoning the Oracle to Slay It: Mitigating Look-Ahead Bias in Financial Backtesting with Large Language Models QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning Hera: Learning Long-Horizon Coordination for Device-Cloud Collaborative LLM Agents SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent Privacy-Preserving Local Language Models for Longitudinal Data Retrieval in Chronic Dermatologic Disease: Implementation in Pemphigus Patients Beyond Inference-Only Deployment: Comparing Weight-Based Consolidation Against Cascading Compaction CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists Adaptive Human-AI Coordination via Hierarchical Action Disentanglement Hylos: Operability Contracts for Model-Native Spatial Intelligence Solving Combinatorial Counting Problems with Weighted First-Order Model Counting Mitigating Object Hallucinations in Vision-Language Models through Region-Aware Attention Recalibration A Signal-Language Foundation Model for Broad-Spectrum Cardiovascular Assessment from Routine Electrocardiography Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs Hypothesis Generation and Inductive Inference in Children and Language Models Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop Scheduling Inference Time Context Sparsity: Illusion or Opportunity? Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction AVBench: Human-Aligned and Automated Evaluation Benchmark for Audio-Video Generative Models Understanding and Mitigating Premature Confidence for Better LLM Reasoning Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models JT-SAFE-V2: Safety-by-Design Foundation Model with World-Context Data CoRe-Code: Collaborative Reinforcement Learning for Code Generation PALoRA: Projection-Adaptive LoRA for Preserving Reasoning in Large Language Models GRAIL: AI translation for scientists application workflow on satellite data MAPLE: Multi-State Aggregated Policy Evaluation for AlphaZero in Imperfect-Information Games Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning Right-Sizing Communication and Recommendation Set Size in AI-Assisted Search Confidence Calibration in Large Language Models Trust but Verify: Prover-Verifier Deliberation for Selective LLM Prediction TaBIIC2: Interactive Building of Ontological Taxonomies using Weighted Self-Organizing Maps Identifying and Mitigating Systemic Measurement Bias in Production LLM Inference Benchmarks Machine Psychometrics: A Mathematical Psychology of Artificial Intelligence Reasoning as an Attack Surface: Adaptive Evolutionary CoT Jailbreaks for LLMs Market Regime Council for Dynamic Credit Assignment in Multi-Agent LLM Decision Systems LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows Safety-Oriented Routing Analysis of Mixtral MoE Under Benign and Harmful Prompts Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models FLOATBench: A Dataset and Benchmark for Floating Offshore Wind Turbine Tower Fatigue Automated Detection and Classification of Delusion-related Content in Naturalistic Audio Diaries Using Multi-Agent Language Models When Mean CE Fails: Median CE Can Better Track Language Model Quality CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents Lattice theory and algebraic models for deep convolutional learning based on mathematical morphology How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning Beyond the Frontier: Stochastic Backtracking for Efficient Test-Time Scaling From Model Scaling to System Scaling: Scaling the Harness in Agentic AI The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching Advancing Graph Few-Shot Learning via In-Context Learning Partner-Aware Hierarchical Skill Discovery for Robust Human-AI Collaboration GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration AI Cartography: Mapping the Latent Landscape of AI Benchmark Ecosystems DemoEvolve: Overcoming Sparse Feedback in Agentic Harness Evolution with Demonstrations Distilling Game Code World Model Generation into Lightweight Large Language Models Beyond Control-Flow: Integrating the Resource Perspective into Multi-Collaborative Process Modeling from Text ConceptM$^3$oE: Concept-Guided Multimodal Mixture of Experts for Interpretable Computational Pathology Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning Associations between echocardiographic traits and AI-ECG predictions of heart failure Agent-as-Peer-Debriefer: A Multi-Agent Framework with Perspective-Based Refinement for Qualitative Analysis Emotional intelligence in large language models is fragmented across perception, cognition, and interaction Inverting the Shield: Systematically Generating Safety Tests from Policy Specifications BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization Test-Time Deep Thinking to Explore Implicit Rules MuCRASP: Multimodal Chain-of-thought Reasoning aware Structured Pruning Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications PANDO: Efficient Multimodal AI Agents via Online Skill Distillation Clustering as Reasoning: A $k$-Means Interpretation of Chain-of-Thought Graph Learning PRIMA: Operational Patterns for Resilient Multi-Agent Research with Verifiable Identity and Convergent Feedback LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors SPACE: Unifying Symmetric and Asymmetric Routing Problems for Generalist Neural Solver Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure How Well Do Models Follow Their Constitutions? AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning RECTOR: Priority-Aware Rule-Based Reranking for Compliance-Aware Autonomous Driving Trajectory Selection Second Guess: Detecting Uncertainty Through Abstention and Answer Stability in Small Language Models Agent Manufacturing: Foundation-Model Agents as First-Class Industrial Entities Learning to Reason Efficiently with A* Post-Training Uncertainty Decomposition via Cyclical SG-MCMC and Soft-label Learning for Subjective NLP Toward Enactive Artificial Intelligence In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models Insuring Every Action: An Authority Frontier Framework for Runtime Actuarial Control of Autonomous AI Agents Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World A governance horizon for ethical-use constraints in open-weight AI models
$D^2$-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing
Aoxi Liu, Yu · 2026-05-26 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:Despite the emergence of diffusion large language models (D-LLMs) as an alternative to autoregressive large language models (AR-LLMs), safety monitoring for D-LLMs remains largely unexplored. Unlike AR-LLMs, D-LLMs generate text through a multi-step denoising process, exposing intermediate hidden representations that may contain safety-relevant information unavailable in standard single-step monitoring setups. Motivated by the suitability of lightweight probes for always-on monitoring, we analyze which trajectory-level signals best indicate when such probes are likely to struggle. We find that the most informative signal is safety hesitation: intermediate hidden states repeatedly falling within a small margin of the probe's decision boundary. The number of such hesitation steps in D-LLM's trajectory predicts probe failure effectively, providing a proxy of sample difficulty. Building on this analysis, we propose $D^2$-Monitor, a bi-level safety monitor for D-LLMs. $D^2$-Monitor adopts a lightweight probe as an always-on monitor to jointly estimate hesitation and perform base classification. When the hesitation level exceeds a threshold, a more expressive but computationally heavier probe is activated. This dynamic routing mechanism allocates monitoring resources efficiently at test time. Evaluated on 3 datasets (WildguardMix, ToxicChat, OpenAI-Moderation) across 4 D-LLMs, $D^2$-Monitor achieves state-of-the-art performance with a compact parameter footprint ($\leq$ 0.85M parameters), and exhibits the best trade-off between effectiveness and efficiency relative to 8 baselines.
Subjects: Artificial Intelligence (cs.AI)
Cite as: arXiv:2605.25893 [cs.AI]
  (or arXiv:2605.25893v1 [cs.AI] for this version)
  https://doi.org/10.48550/arXiv.2605.25893

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Aoxi Liu [view email]
[v1] Mon, 25 May 2026 14:22:21 UTC (706 KB)