惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.CV updates on arXiv.org

A Novel Approach for the Counting of Wood Logs Using cGANs and Image Processing Techniques EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution The TIME Machine: On The Power of Motion for Efficient Perception PixIE: Prompted Pixel-Space Low-Light Image Enhancement Spatio-Temporal Similarity Volume Aggregation for Open-Vocabulary Action Recognition Sparser Block-Sparse Attention via Token Permutation PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA CARE: Class-Adaptive Expert Consensus for Reliable Learning with Long-Tailed Noisy Labels B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation Rethinking Transfer Learning for Industrial Inspection: DINOv3 vs. ImageNet Pretraining Across RGB and X-ray Tasks ETCHR: Editing To Clarify and Harness Reasoning Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs Lipschitz Optimization for Formal Verification of Homographies General Hazard Detection DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models PhenoYieldNet: Learning Crop-Aware Phenological Responses for Multi-Crop Yield Prediction Recursive Block-Diagonal Coupling for Resource-Efficient Training of Vision Models STAMBRIDGE: Spectral-Temporal Amplitude-aware Mid-Feature Bridge for EEG Visual Decoding Joint Target-Less Intrinsic and Extrinsic Camera-LiDAR Calibration using Deep Point Correspondences CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering MapGCLR: Geospatial Contrastive Learning of Representations for Online Vectorized HD Map Construction CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models Smart-Insertion-V: Photorealistic Video Insertion via a Closed-Loop Feedback Dual-Stream Framework GFSR: Geometric Fidelity and Spatial Refinement for Reliable Lane Detection Semantic-Aware Guided Drone Exploration for Language-Conditioned 3D Indoor Mapping Revitalizing Dense Material Segmentation: Stabilized Vision Transformers and the Generalization Paradox Exploring deep learning for Event-Based Saliency Prediction with a Transformer-based model Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision? EM-Vid: Training-Free Entity-Centric Memory for Efficient and Consistent Multi-Shot Video Generation Do Synthetic Brain MRIs Reliably Improve Tumour Classification? A StyleGAN2-ADA Class-Plane Augmentation Study on BRISC 2025 IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction SLIP-RS: Structured-Attribute Language-Image Pre-Training for Remote Sensing Object Detection Millimeter-wave Imaging for Anthropometric Body Measurement VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations Vision Transformers Need Better Token Interaction DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025 VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images Weierstrass Positional Encoding for Vision Transformers SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion LaMo: Self-Supervised Latent Motion Priors for Physical Realism in Video Generation Beyond Normal References: Discriminative Few-Shot Anomaly Detection Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention ComPose: When to Trust Hands for Object Pose Tracking PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation Extending Deep Event Visual Odometry with Sparse Point-Cloud Export GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation Enhancing Blood Cells Classification using Hybrid Quantum Neural Networks MDS-DETR: DETR with Masked Duplicate Suppressor CoMoGen: COntrollable MOtion Dynamics and Interactions with Mask-Guided Video GENeration Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering DepthAgent: Towards Better Universal Depth Estimation via Sample-wise Expert Selection Decoupling Spatio-Temporal Adapter for Fine-Grained Badminton Action Localization CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs Geo-Align: Video Generation Alignment via Metric Geometry Reward One-Forcing: Towards Stable One-Step Autoregressive Video Generation Online Hand Gesture Recognition Using 3D Convolutional Neural Networks From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset ExpOS: Explainable Open-Surgery Skills Assessment Using 3D Hand Reconstruction Multimodal Distribution Matching for Vision-Language Dataset Distillation RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot Generator-Refiner-Examiner: A Tri-Module Data Augmentation Framework for 3D Human Avatar Learning from Monocular Videos DualMem: Bypassing the Objectness Bottleneck for Calibrated Unknown-Stream Filtering in Open-World Object Detection Learning a Particle Dynamics Model with Real-world Videos Scene Reconstruction as Mapping Priors for 3D Detection Calibration-Informative Region Selection for Online LiDAR--Camera Calibration in Agricultural Environments Inconsistency-aware Multimodal Schrödinger Bridge for Deepfake Localization Flow Mismatching: Unsupervised Anomaly Detection via Velocity Discrepancies in Flow Matching Models CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering VideoOdyssey: A Benchmark for Ultra-Long-Context and Omni-Modal Video Understanding PhotoFlow: Agentic 3D Virtual Photography Missions Composing People Together: Iterative Pose-Image Generation for Multi-Person Interaction Scenes LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images MuellerPT: Decomposition Driven Pretraining for Dense Learning in Mueller Polarimetry StereoGenBench: A Synthetic Multi-Camera Benchmark for Stereo Generation under Controlled Baseline Regimes ChainFlow-VLA: Causal Flow Planning with Vision-Language Models Machine learning applied to emerald gemstone grading: framework proposal and creation of a public dataset LQ-rPPG: A Label-Quantized Coarse-to-Fine Learning Framework for Remote Physiological Measurement RADAR: Relative Angular Divergence Across Representations HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing GMENet: Generative Mixture of Experts Network for Multi-Center Glioma Diagnosis with Incomplete Imaging Sequences MedExpMem: Adapting Experience Memory for Differential Diagnosis FAST-ME: Foundation-aware Adaptive Stopping for Motion Estimation for Efficient IoT Video Analysis
Using Ensemble Diffusion to Estimate Uncertainty for End-to-End Autonomous Driving
Florian Wint · 2026-05-25 · via cs.CV updates on arXiv.org

View PDF HTML (experimental)

Abstract:End-to-end planning systems for autonomous driving are rapidly improving, especially in closed-loop simulation environments like CARLA. Many such driving systems either do not consider uncertainty as part of the plan itself or obtain it by using specialized representations that do not generalize. In this paper, we propose EnDfuser, an end-to-end driving system that uses a diffusion model as the trajectory planner. EnDfuser effectively leverages complex perception information like fused camera and LiDAR features, through combining attention pooling and trajectory planning into a single diffusion transformer module. Instead of committing to a single plan, EnDfuser produces a distribution of candidate trajectories (128 for our case) from a single perception frame through ensemble diffusion. By observing the full set of candidate trajectories, EnDfuser provides interpretability for uncertain, multimodal future trajectory spaces. Using this information we design a simplistic safety-rule that improves the system's driving score by 1.7% on the LAV benchmark. Our findings suggest that ensemble diffusion, used as a drop-in replacement for traditional point-estimate trajectory planning modules, can contribute to an uncertainty-aware decision making process in End-to-End driving policies by modeling the uncertainty of the posterior trajectory distribution.
Comments: Accepted at NLDL 2026
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2506.00560 [cs.RO]
  (or arXiv:2506.00560v2 [cs.RO] for this version)
  https://doi.org/10.48550/arXiv.2506.00560

arXiv-issued DOI via DataCite

Submission history

From: Florian Wintel [view email]
[v1] Sat, 31 May 2025 13:33:27 UTC (2,834 KB)
[v2] Fri, 22 May 2026 14:13:18 UTC (6,307 KB)