Think Twice Before You Act: Enhancing Agent Behavioral Safety with Thought Correction - 惯性聚合

推荐订阅源

Full Disclosure

Recorded Future

CERT Recently Published Vulnerability Notes

Schneier on Security

The Hacker News

CXSECURITY Database RSS Feed - CXSecurity.com

Know Your Adversary

Privacy International News Feed

Threat Intelligence Blog | Flashpoint

The Register - Security

Cisco Talos Blog

Kaspersky official blog

True Tiger Recordings

Threat Research - Cisco Blogs

Vulnerabilities – Threatpost

Palo Alto Networks Blog

The Exploit Database - CXSecurity.com

Cyber Security Advisories - MS-ISAC

Microsoft Azure Blog

Cybersecurity and Infrastructure Security Agency CISA

Tor Project blog

Proofpoint News Feed

Fox-IT International blog

Fortinet All Blogs

Privacy & Cybersecurity Law Blog

OSCHINA 社区最新新闻

博客园 - 叶小钗

Tailwind CSS Blog

Netflix TechBlog - Medium

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog

博客园_首页

Hackread – Cybersecurity News, Data Breaches, AI and More

Darknet – Hacking Tools, Hacker News & Cyber Security

cs.AI updates on arXiv.org

Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective MedVol-R1: Reward-Driven Evidence Grounding for Volumetric Reasoning Segmentation CmIVTP: Cross-modal Interaction-based Vessel Trajectory Prediction for Maritime Intelligence Self-Improvement Imitation with Biologically Guided Search for Protein Design Under Oracle Budgets "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models Diffuse to Detect: Generative Diffusion Models for Unsupervised IC Anomaly Detection When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning Falcon-X: A Time Series Foundation Model for Heterogeneous Multivariate Modeling The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training Trust Region Q Adjoint Matching Personalized Generative Models for Contextual Debiasing E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling Adversarial Training for Robust Coverage Network under Worst-case Facility Losses ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference Spend Your Rollouts Where It Counts: Rollout Allocation for Group-Based RL Post-Training Measuring Prediction Uncertainty in Neural Cellular Automata Co-folding model guided by structural proteomics Innovation: An Almost Characterization of Hallucination Self-Cascaded Diffusion Models for Arbitrary-Scale Image Super-Resolution Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control Recursive Flow Matching Planning Neural Dynamics with Lie Group Embedding through Supervised Projective Manifold Learning Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models Timestep-Aware SVDQuant-GPTQ for W4A4 Quantization of Wan2.2-I2V The Rescue Effect: Spatio-Semantic Early Exit Bypasses Quantization Collapse in CLIP Towards Generalization-Oriented Models for Vehicle Routing Problems with Mixture-of-Experts Periodic Topological Deep Learning for Polymer Design and Discovery CSV-ViT: A Vision Transformer with the Variable-sized Cortical Supervertices for Detection of Alzheimer's Disease Pathologies EEG-FM-Audit: A Systematic Evaluation and Analysis Pipeline for EEG Foundation Models Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications HRVConformer: Neonatal Hypoxic-Ischemic Encephalopathy Classification from the Heart Rate signals High-Quality Synthetic Financial Time-Series using a GAN-Diffusion Framework Adaptive Multi-prompt Contrastive Network for Few-shot Out-of-distribution Detection Semigroup Consistency as a Diagnostic for Learned Physics Simulators When Does Adaptive Guidance Help? Belief-Aware Privileged Distillation for Autonomous Driving Under Partial Observability Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training DEI: Diversity in Evolutionary Inference for Quality-Diversity Search Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models GEM: Geometric Entropy Mixing for Optimal LLM Data Curation When Eyes Betray AI: Social Gaze Consistency as a Semantic Cue for AI-Generated Image Detection Linear and Neural Dueling Bandits with Delayed Feedback TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models Generative Animations: A Multi-Model Pipeline for Prompt-Driven Motion Synthesis Curriculum Learning for Safety Alignment Unified Panoramic Geometry Estimation via Multi-View Foundation Models On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach Model Merging on Loss Landscape: A Geometry Perspective InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward Geometry-Aware Contrastive Learning for Few-Shot Automatic Modulation Recognition On the Error-Correcting Effects of Stochasticity in Discrete Diffusion Respecting Modality Gap in Post-hoc Out-of-distribution Detection with Pre-trained Vision-Language Models JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search Semantic Robustness Probing via Inpainting: An Interactive Tool for Safety-Critical Object Detection SL-BiLEM: Structured Learnable Behavior-in-the-Loop Epidemic Modeling for Forecasting and Policy Evaluation Dense2MoE: Pushing the Pareto Frontier of On-Device LLMs via Unified Pruning and Upcycling Ratio-Variance Regularized Policy Optimization AssetGen: Deployable 3D Asset Generation at Interactive Speed AnchorDiff: Training-Free Concept Grounding for MM-DiTs via Anchor-Based Graph Propagation Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Less is More: Early Stopping Rollout for On-Policy Distillation The Kalman Evolve: Closing the Gap in Kalman Filtering via Interpretable Algorithm Discovery Cross-scale Aligned Supervision for Training GANs Hands-On: Segmenting Individual Signs from Continuous Sequences EdgeFlow: Edge-Map Augmented VLM-Based Flowchart Processing for Industrial Requirements Engineering LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models Deep-layer limit and stability analysis of the basic forward-backward-splitting induced network (II): learning problems Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization Explainable Cross-Disease Reasoning for Cardiovascular Risk Assessment from Low-Dose Computed Tomography ReCA: Multi-Shot Long Video Extrapolation via Recursive Context Allocation A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection MobileMoE: Scaling On-Device Mixture of Experts Modeling Dynamic Mixtures of Time-Delay Systems from Streaming Time Series DDGAD: Trajectory Dynamics for Diffusion-Based Graph Anomaly Detection Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference Unified Neural Scaling Laws BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning Comparative Study of Vision-Based Metric Measurement for Large-Scale Planar Scenes Bilevel Optimization over Saddle Points of Zero-Sum Markov Games Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection Inference-Time Search Using Side Information for Diffusion-Based Image Reconstruction Unveiling the Fragility of Vision-Language Models: Multi-Modal Adversarial Synergy via Texture-Constrained Perturbations and Cross-Modal Optimization Scalable GANs with Transformers Auditing and Fixing Economic Validity in Tabular Foundation Models for Discrete Choice VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation VesselSim: learning 3D blood vessel segmentation without expert annotations VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents

Think Twice Before You Act: Enhancing Agent Behavioral Safety with Thought Correction

Changyue Jia · 2026-05-27 · via cs.AI updates on arXiv.org

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。