惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs updates on arXiv.org

End-to-End Intracortical Speech Decoding from Neural Activity Five Queries Are Enough: Query-Efficient and Surrogate-Free Membership Inference Attacks on RAG via Entailment Accuracy Analysis of the Proxy Point Method with Applications to Some Toeplitz Matrices Interdomain Attention: Beyond Token-Level Key-Value Memory ViViD-5K: Vineyard vision dataset for field-based berry detection and segmentation and grape cluster closure estimation Fourier Feature Pyramids for Physics-Informed Neural Networks A Comprehensive Evaluation of Vertex Elimination Algorithms for Algorithmic Differentiation Deep-Research Agents Can Be Poisoned via User-Generated Content PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets Me, Myself, and My Voice: Exploring Cultural and Linguistic Identity in AAC AI-generated Voices Causal Physics Steering in Video World Models via Concept Activation Vectors An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods Evolving Robustness--Exploration Trade-off in Online Reinforcement Learning via Quantile Bayesian Risk MDPs Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes MeVer at CheckThat! 2026: Cluster-Aware Hard-Negative Mining for Multilingual Scientific-Source Retrieval AcroRL: Learning Aggressive Quadrotor Inversion using Bidirectional Thrust RxGS: Receiver-Generalizable 3D Gaussian Splatting for Radio-Frequency Data Synthesis Bayesian Rational Search Engine User Private Adaptive Covariance Estimation via Gaussian Graphical Models Contested Temporalities in Critical Minerals and Resource Extraction for Electric Vehicles Network Digital Twin for Congestion-Aware Predictive Traffic Routing using Graph MPNNs Adaptive Human-AI Coordination via Hierarchical Action Disentanglement Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning Discovering Lexical Gaps Using Embeddings from Multilingual LLMs CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows Enhancing Reliability in LLM-Based Secure Code Generation Omissive Bias in Religious Representation: Benchmarking LLM Answers to Everyday Ethical Decision-making Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions Resident KV Claims: A Conformance Contract for Future Reuse under Active KV Pressure Attested Tool-Server Admission: A Security Extension to the Model Context Protocol How Well Do Models Follow Their Constitutions? Can Graph-Based Microservice Performance Detection Be Used for Microservice Intrusion Detection? CoDA: Color Distribution Probing for Efficient and Generalizable AI-Generated Image Detection Cross-Modal Action Recognition in Egocentric Video Using Mamba: Integrating RGB and Hand Skeleton Streams via CLS Token Fusion Strategies Learning regime-dependent governing equations: A symbolic decision tree approach Toward Enactive Artificial Intelligence How Much Structure Do LLMs Need? Evaluating LLMs for Bibliometric Cluster Description IsaacIPC: Coupling High-Fidelity Simulation and Realistic Rendering for Contact-Rich Robotic Systems Terrain-Adaptive Grouser Wheel for Optimal Planetary Exploration: Design and Experimental Investigation Partner-Aware Hierarchical Skill Discovery for Robust Human-AI Collaboration PACT: Proactive Asking for Continual Task Assistance in Human-Robot Collaboration PAIRED: A Process-Anchored Framework for Transparent Reporting of AI Contributions in Scientific Research Unified 3D Scene Understanding Through Physical World Modeling ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions A lift for input-convex neural network training LLMs Show No Signs Of Individuated Metacognition TUBE: Tangent Upper Bound on Evidence for Discrete Diffusion Language Models Optimizing Digital Therapeutic Interventions: Online Learning under Endogenous Adherence DRInQ: Evaluating Conversational Implicature with Controlled Context Variation Refined Analysis of Entropy-Regularized Actor-Critic ChainzRule: Sample-Efficient, Robust Deep Learning Across Tabular, NLP, and Vision Tasks From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection Polar: Agentic RL on Any Harness at Scale Identifying and Mitigating Systemic Measurement Bias in Production LLM Inference Benchmarks Rubato: Transcribing Piano Music with Timestamps ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale An Interactive Paradigm for Deep Research Ant Backpressure Routing for Dynamic Wireless Multi-hop Networks with Mixed Traffic Patterns Sketch Bug: Using Sketch-Based Input for Interactive Code Debugging GIBLy: Improving 3D Semantic Segmentation through an Architecture-Agnostic Lightweight Geometric Inductive Bias Layer Program Synthesis for Non-Linear Real Arithmetic: Going Beyond Realizability CRISP -- Clustering-Based Redundancy-Reduced Instance Sampling for Pathology Case Representation and Retrieval Unlocking Apple's Private Cloud Compute: An Analysis of Privacy-Preserving Artificial Intelligence ECo-MoE: Embodiment-Conditioned Mixture of Experts Increases the Evolvability of Robots When Does Synthetic Patent Data Help? Volume-Fidelity Trade-offs in Low-Resource Multi-Label Classification Humans Cannot Detect AI-Generated Media But Communities May -- For Now: Collaborative AI Detection in r/RealOrAI on Reddit Plume Segmentation from MethaneSAT with Cross-Sensor Transfer Learning and Physics-Informed Postprocessing On Permutation Groups of Cyclic Codes over Finite Fields Improving the Accuracy of the Exponentially Fitted Scheme on Piecewise Uniform Meshes ArtSplat: Feed-Forward Articulated 3D Gaussian Splatting from Sparse Multi-State Uncalibrated Views Benchmarking Patent Embeddings: A Multi-Task Evaluation of 22 Models Across Retrieval, Classification, and Clustering Safety-Oriented Routing Analysis of Mixtral MoE Under Benign and Harmful Prompts Analyzing the Effects of Two-Stage Peer Evaluation Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation Designs, linear codes, plateaued functions, and their interconnections Tacit Signal Infrastructure: Towards AI Systems that Model Expert Sensing Over Time ScaleAcross Explorer: Exploring Communication Optimization for Scale-Across AI Model Training Reframing LLM Agent Security as an Agent-Human Interaction Problem LEARNT: A Practical Estimator for Cardinality of LIKE Queries with Formal Accuracy Guarantees Modernizing User Privacy Preference Measurement through GPPI: A GDPR-aligned Privacy Preference Item Bank QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks How Far Will They Go? Red-Teaming Online Influence with Large Language Models Memorization Dynamics of Fill-in-the-Middle Pretraining A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism NeuroNL2LTL: A Neurosymbolic Framework for Natural Language Translation of Linear Temporal Logic SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems A Comparative Evaluation of Structural Topic Models and BERTopic for Short, Open-Ended Survey Responses AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse Design and Report Benchmarks for Knowledge Work Emotion Recognition in Sign Language Conversation Ontological Knowledge Blocks: Executable Compliance and Profile-Based Validation for Trustworthy AI Systems GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation Cultural Adaptation in Large Language Models for Political Discourse DART: Semantic Recoverability for Structured Tool Agents Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision? From Correctness to Preference: A Framework for Personalized Agentic Reinforcement Learning
SparseWorld: Enhancing End-to-End Autonomous Driving via World Models with Sparse Scene Representation
Ruoyu Wang, · 2026-05-26 · via cs updates on arXiv.org

View PDF HTML (experimental)

Abstract:Recently, world models have made significant progress in enhancing end-to-end driving systems through both future situation forecasting and improved scene understanding. However, existing driving world models are typically built upon dense scene representations, causing high computational costs and redundant information. In this paper, we present SparseWorld, a lightweight world model that focuses on predicting only the critical layout of the scene, enabling efficient future forecasting for end-to-end driving systems. SparseWorld first performs autoregressive rollout to forecast future map elements and surrounding agents, enabling the model to learn how driving scenarios evolve over time. It then leverages these predicted futures to refine downstream motion prediction and trajectory planning. Specifically, we propose a Sparse Dreamer that anticipates future instances in the latent space through joint temporal and spatial attention. By interacting with predicted future instances, the motion planner captures more accurate motion patterns and generates more informed and safety-aware trajectories. Extensive experiments demonstrate that SparseWorld significantly reduces collision risk and achieves state-of-the-art performance on the open-loop planning metrics of the nuScenes dataset with a collision rate of 0.05\%. Moreover, it substantially outperforms the baseline method in closed-loop planning metrics on the Bench2Drive benchmark. Supplementary material is available at the project page: this https URL.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2605.24354 [cs.CV]
  (or arXiv:2605.24354v1 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2605.24354

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ruoyu Wang [view email]
[v1] Sat, 23 May 2026 02:30:33 UTC (4,596 KB)