惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
P
Proofpoint News Feed
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Google DeepMind News
Google DeepMind News
T
The Blog of Author Tim Ferriss
T
Tor Project blog
T
Threatpost
V
Vulnerabilities – Threatpost
大猫的无限游戏
大猫的无限游戏
量子位
Scott Helme
Scott Helme
Schneier on Security
Schneier on Security
有赞技术团队
有赞技术团队
Recent Commits to openclaw:main
Recent Commits to openclaw:main
李成银的技术随笔
K
Kaspersky official blog
T
ThreatConnect
美团技术团队
博客园 - Franky
爱范儿
爱范儿
A
Arctic Wolf
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
SecWiki News
SecWiki News
Microsoft Azure Blog
Microsoft Azure Blog
博客园 - 叶小钗
Recorded Future
Recorded Future
L
Lohrmann on Cybersecurity
J
Java Code Geeks
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
D
DataBreaches.Net
Spread Privacy
Spread Privacy
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Comments on: Blog
B
Blog RSS Feed
L
LINUX DO - 热门话题
阮一峰的网络日志
阮一峰的网络日志
腾讯CDC
酷 壳 – CoolShell
酷 壳 – CoolShell
N
Netflix TechBlog - Medium
S
SegmentFault 最新的问题
S
Security @ Cisco Blogs
Latest news
Latest news
I
InfoQ
Project Zero
Project Zero
P
Privacy International News Feed
D
Docker
The Hacker News
The Hacker News
A
About on SuperTechFans

cs.LG updates on arXiv.org

Recursive Flow Matching Model Merging on Loss Landscape: A Geometry Perspective JLT: Clean-Latent Prediction in Latent Diffusion Transformers Auditing and Fixing Economic Validity in Tabular Foundation Models for Discrete Choice Classification and detection of multiple UAVs using rational Gaussian wavelet neural networks When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning Dense2MoE: Pushing the Pareto Frontier of On-Device LLMs via Unified Pruning and Upcycling DDGAD: Trajectory Dynamics for Diffusion-Based Graph Anomaly Detection On the Role of Inductive Bias in Time-Series Pretraining: A Case Study in Learning Generalizable Representations for Clinical Time Series PIDM-DP: Physics-Informed Diffusion with Dormand-Prince Integration for Chaotic System Identification and State Reconstruction across Multiple Dynamical Regimes Balancing Plasticity and Stability with Fast and Slow Successor Features Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift HRVConformer: Neonatal Hypoxic-Ischemic Encephalopathy Classification from the Heart Rate signals APEX: Amplitude Anchors and Phase Priors for Target-Scarce Higher-Frequency Wave Prediction Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models Variational Inference for Evidential Deep Learning CSV-ViT: A Vision Transformer with the Variable-sized Cortical Supervertices for Detection of Alzheimer's Disease Pathologies Open-Weight LLM Fine-Tuning Defenses are Susceptible to Simple Attacks FM-fMRI: Event Conditioned Flow Matching for Rest-to-Task fMRI Time-Series Synthesis QAM-W: Joint 2D Codebook Quantization for LLM Weights via Hadamard Rotation and Activation-Aware Scaling SilIF: Silhouette-Augmented Isolation Forest for Unsupervised Transaction Fraud Detection The Bridge-Garden Dilemma in LLM Distillation: Why Mixing Hard and Soft Labels Works TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning Towards Controllable Image Generation through Representation-Conditioned Diffusion Models Linear and Neural Dueling Bandits with Delayed Feedback Adversarial Training for Robust Coverage Network under Worst-case Facility Losses When Rule Violations Are Rare: Chimera Training for Logical Anomaly Detection WINDQuant: Weight-Informed Neural Decision-Making for Global Mixed-Precision LLM Quantization The Constraint Tax: Measuring Validity-Correctness Tradeoffs in Structured Outputs for Small Language Models RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models Neural Bayesian Sequential Routing Self-Improvement Imitation with Biologically Guided Search for Protein Design Under Oracle Budgets Bilevel Optimization over Saddle Points of Zero-Sum Markov Games Provably Communication-Efficient and Privacy-Preserving Federated Graph Neural Networks MTL-FNO: A Lightweight Multi-Task Fourier Neural Operator for Sparse Field Reconstruction Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage Unified Neural Scaling Laws MuCon: Clipped Muon Updates for LLM Training LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding GEM: Geometric Entropy Mixing for Optimal LLM Data Curation MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding A Fast and Generic Energy-Shifting Transformer for Hybrid Monte Carlo Radiotherapy Calculation Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma PRISM: Position-encoded Regressive Inverse Spectral Model for Multilayer Thin-Film Design Time Series Causal Discovery via Context-Conditioned and Causality-Augmented Pretraining Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention Semigroup Consistency as a Diagnostic for Learned Physics Simulators Planning Neural Dynamics with Lie Group Embedding through Supervised Projective Manifold Learning Function-Valued Causal Influence in Nonlinear Time Series Extra-Merge: Tracing the Rank-1 Subspace of Model Merging in Language Model Pre-Training Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion Curriculum Learning for Safety Alignment SIKA-GP: Accelerating Gaussian Process Inference with Sparse Inducing Kernel Approximations for Bayesian Deep Learning Personalized Generative Models for Contextual Debiasing Geometry-Aware Contrastive Learning for Few-Shot Automatic Modulation Recognition Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning Distribution-Aware Conformal Prediction: A Framework for generating efficient prediction intervals for time series On the Error-Correcting Effects of Stochasticity in Discrete Diffusion On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations Separate Aggregation of Split Network for Personalized Federated Learning SL-BiLEM: Structured Learnable Behavior-in-the-Loop Epidemic Modeling for Forecasting and Policy Evaluation GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training Localizing Memorized Regions in Diffusion Models via Coordinate-Wise Curvature Differences A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation Ratio-Variance Regularized Policy Optimization Focal Reward: Balanced Reinforcement Learning under Rubric-Based Rewards Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization Is an Image Also Worth 16x16=256 Superpixels? A Framework for Attentional Image Classification From Privacy to Generalization: Linear Max-Information Bounds for DP-SGD Diffuse to Detect: Generative Diffusion Models for Unsupervised IC Anomaly Detection Modeling Dynamic Mixtures of Time-Delay Systems from Streaming Time Series Stateful Inference for Low-Latency Multi-Agent Tool Calling When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control Two-Parameter Flows for Learning Population Dynamics of Physical Systems MechRL: Reinforcement Learning Agents Perform Circuit Discovery for Mechanistic Interpretability InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization Pretrained Approximators for Low-Thrust Trajectory Cost and Reachability Co-folding model guided by structural proteomics Towards Generalization-Oriented Models for Vehicle Routing Problems with Mixture-of-Experts AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models Image Feature Fusion-based Federated Client Unlearning (FCU) Spend Your Rollouts Where It Counts: Rollout Allocation for Group-Based RL Post-Training PILOT: A Data-Free Continual Learning Approach for Real-Time Semantic Segmentation via Boundary Guidance Near-Optimal Regret in Adversarial Kernel Bandits Amortized Factor Inference Networks for Posterior Inference Dynamic Link Prediction with Temporally Enhanced Signed Graph Neural Networks
Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral Uncertainty
Prakash Arya · 2026-05-21 · via cs.LG updates on arXiv.org

View PDF HTML (experimental)

Abstract:Simulation-based testing of self-driving cars (SDCs) typically relies on scripted pedestrian models that do not capture the heterogeneity and uncertainty of real crossing behavior, limiting the realism of safety assessments, especially for jaywalking, which is governed by latent personality traits the vehicle cannot observe. We hypothesize that jointly training pedestrians and the SDC with multi-agent reinforcement learning (MARL) yields more realistic interaction scenarios than training against fixed pedestrian policies, and that the behavior gap between predictable and unpredictable crossings can be measured directly from trajectories. We co-train an SDC and 12 pedestrians using Multi-Agent Proximal Policy Optimization (MAPPO): pedestrian locomotion follows scripted Dijkstra pathfinding while an RL policy controls high-level go/wait decisions, and jaywalking probability depends on a per-pedestrian trait sampled at episode start and hidden from the SDC. In 500-episode evaluations, the co-trained SDC reached 78% of goals with a 14% collision rate, versus 35%/33% for the best rule-based baseline. A speed differential metric shows the SDC traveled 2.65 m/s faster near jaywalkers than near crosswalk users at close range (0-3 m), indicating jaywalking encounters were not anticipated. Jaywalking was 13% of crossing events but 62% of collisions, and co-training reduced collisions by 30% relative to single-agent RL as pedestrians learned to wait when the SDC approached at speed.
Comments: Accepted to ICRA 2026 Workshop "8th Workshop on Long-term Human Motion Prediction"
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO)
Cite as: arXiv:2605.20255 [cs.LG]
  (or arXiv:2605.20255v2 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2605.20255

arXiv-issued DOI via DataCite

Submission history

From: Prakash Aryan [view email]
[v1] Mon, 18 May 2026 12:02:41 UTC (457 KB)
[v2] Mon, 25 May 2026 19:49:33 UTC (415 KB)