惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Microsoft Security Blog
Microsoft Security Blog
Forbes - Security
Forbes - Security
月光博客
月光博客
WordPress大学
WordPress大学
Last Week in AI
Last Week in AI
罗磊的独立博客
V
Visual Studio Blog
Help Net Security
Help Net Security
宝玉的分享
宝玉的分享
H
Heimdal Security Blog
The Last Watchdog
The Last Watchdog
V
V2EX - 技术
S
SegmentFault 最新的问题
爱范儿
爱范儿
C
Check Point Blog
GbyAI
GbyAI
L
LINUX DO - 最新话题
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
W
WeLiveSecurity
Martin Fowler
Martin Fowler
Google Online Security Blog
Google Online Security Blog
F
Fortinet All Blogs
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Google DeepMind News
Google DeepMind News
aimingoo的专栏
aimingoo的专栏
H
Hacker News: Front Page
M
MIT News - Artificial intelligence
T
Threatpost
IT之家
IT之家
AI
AI
P
Privacy & Cybersecurity Law Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
美团技术团队
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Stack Overflow Blog
Stack Overflow Blog
博客园 - 叶小钗
云风的 BLOG
云风的 BLOG
The Hacker News
The Hacker News
N
News and Events Feed by Topic
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
大猫的无限游戏
大猫的无限游戏
C
CXSECURITY Database RSS Feed - CXSecurity.com
S
Security Archives - TechRepublic
T
The Blog of Author Tim Ferriss
Cloudbric
Cloudbric
博客园_首页
Hugging Face - Blog
Hugging Face - Blog
G
GRAHAM CLULEY
V
V2EX
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知

cs.CV updates on arXiv.org

One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding Seedance 2.0: Advancing Video Generation for World Complexity ROSE: Retrieval-Oriented Segmentation Enhancement SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments Don't Let the Video Speak: Audio-Contrastive Preference Optimization for Audio-Visual Language Models HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding Training-Free Semantic Multi-Object Tracking with Vision-Language Models Towards Unconstrained Human-Object Interaction OneHOI: Unifying Human-Object Interaction Generation and Editing Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself Decoding the Delta: Unifying Remote Sensing Change Detection and Understanding with Multimodal Large Language Models Seek-and-Solve: Benchmarking MLLMs for Visual Clue-Driven Reasoning in Daily Scenarios POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective Depth-Aware Image and Video Orientation Estimation Remote Sensing Image Super-Resolution for Imbalanced Textures: A Texture-Aware Diffusion Framework HiProto: Hierarchical Prototype Learning for Interpretable Object Detection Under Low-quality Conditions MApLe: Multi-instance Alignment of Diagnostic Reports and Large Medical Images Heuristic Style Transfer for Real-Time, Efficient Weather Attribute Detection SceneGlue: Scene-Aware Transformer for Feature Matching without Scene-Level Annotation A Multi-Stage Optimization Pipeline for Bethesda Cell Detection in Pap Smear Cytology ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding PartNerFace: Part-based Neural Radiance Fields for Animatable Facial Avatar Reconstruction Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias Context Sensitivity Improves Human-Machine Visual Alignment PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image DiffMagicFace: Identity Consistent Facial Editing of Real Videos A Resource-Efficient Hybrid CNN-LSTM network for image-based bean leaf disease classification Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation DRG-Font: Dynamic Reference-Guided Few-shot Font Generation via Contrastive Style-Content Disentanglement Artificial intelligence application in lymphoma diagnosis with Vision Transformer using weakly supervised training From Synchrony to Sequence: Exo-to-Ego Generation via Interpolation PBE-UNet: A light weight Progressive Boundary-Enhanced U-Net with Scale-Aware Aggregation for Ultrasound Image Segmentation Temporally Consistent Long-Term Memory for 3D Single Object Tracking Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation ClipGStream: Clip-Stream Gaussian Splatting for Any Length and Any Motion Multi-View Dynamic Scene Reconstruction ReConText3D: Replay-based Continual Text-to-3D Generation Granularity-Aware Transfer for Tree Instance Segmentation in Synthetic and Real Forests SLQ: Bridging Modalities via Shared Latent Queries for Retrieval with Frozen MLLMs Med-CAM: Minimal Evidence for Explaining Medical Decision Making Beyond Voxel 3D Editing: Learning from 3D Masks and Self-Constructed Data From Pixels to Nucleotides: End-to-End Token-Based Video Compression for DNA Storage VRAG-DFD: Verifiable Retrieval-Augmentation for MLLM-based Deepfake Detection ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation What Are We Really Measuring? Rethinking Dataset Bias in Web-Scale Natural Image Collections via Unsupervised Semantic Clustering VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation Dehaze-then-Splat: Generative Dehazing with Physics-Informed 3D Gaussian Splatting for Smoke-Free Novel View Synthesis Efficient Multi-View 3D Object Detection by Dynamic Token Selection and Fine-Tuning SocialMirror: Reconstructing 3D Human Interaction Behaviors from Monocular Videos with Semantic and Geometric Guidance Radar-Informed 3D Multi-Object Tracking under Adverse Conditions ZoomSpec: A Physics-Guided Coarse-to-Fine Framework for Wideband Spectrum Sensing UHR-BAT: Budget-Aware Token Compression Vision-Language model for Ultra-High-Resolution Remote Sensing CLIP Architecture for Abdominal CT Image-Text Alignment and Zero-Shot Learning: Investigating Batch Composition and Data Scaling AI Powered Image Analysis for Phishing Detection Reconstruction of a 3D wireframe from a single line drawing via generative depth estimation Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression FiRe: Fine-grained Multimodal Reasoning for Enhanced Image Generation MyoVision: A Mobile Research Tool and NEATBoost-Attention Ensemble Framework for Real Time Chicken Breast Myopathy Detection A Study of Failure Modes in Two-Stage Human-Object Interaction Detection MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis A Unified Conditional Flow for Motion Generation, Editing, and Intra-Structural Retargeting Event-Adaptive State Transition and Gated Fusion for RGB-Event Object Tracking VibeFlow: Versatile Video Chroma-Lux Editing through Self-Supervised Learning Physically-Guided Optical Inversion Enable Non-Contact Side-Channel Attack on Isolated Screens MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis CausalDisenSeg: A Causality-Guided Disentanglement Framework with Counterfactual Reasoning for Robust Brain Tumor Segmentation Under Missing Modalities Why Multimodal In-Context Learning Lags Behind? Unveiling the Inner Mechanisms and Bottlenecks A Multimodal Clinically Informed Coarse-to-Fine Framework for Longitudinal CT Registration in Proton Therapy UniBlendNet: Unified Global, Multi-Scale, and Region-Adaptive Modeling for Ambient Lighting Normalization A 3D SAM-Based Progressive Prompting Framework for Multi-Task Segmentation of Radiotherapy-induced Normal Tissue Injuries in Limited-Data Settings Multi-Agent Object Detection Framework Based on Raspberry Pi YOLO Detector and Slack-Ollama Natural Language Interface MSGS: Multispectral 3D Gaussian Splatting SEDTalker: Emotion-Aware 3D Facial Animation Using Frame-Level Speech Emotion Diarization SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting Right Regions, Wrong Labels: Semantic Label Flips in Segmentation under Correlation Shift Towards Successful Implementation of Automated Raveling Detection: Effects of Training Data Size, Illumination Difference, and Spatial Shift Why MLLMs Struggle to Determine Object Orientations The Spectrascapes Dataset: Street-view imagery beyond the visible captured using a mobile platform Deep Spatially-Regularized and Superpixel-Based Diffusion Learning for Unsupervised Hyperspectral Image Clustering Bias at the End of the Score Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision PAT-VCM: Plug-and-Play Auxiliary Tokens for Video Coding for Machines See&Say: Vision Language Guided Safe Zone Detection for Autonomous Package Delivery Drones Explainable Fall Detection for Elderly Monitoring via Temporally Stable SHAP in Skeleton-Based Human Activity Recognition DroneScan-YOLO: Redundancy-Aware Lightweight Detection for Tiny Objects in UAV Imagery Indexing Multimodal Language Models for Large-scale Image Retrieval Rethinking Uncertainty in Segmentation: From Estimation to Decision 4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models SemiFA: An Agentic Multi-Modal Framework for Autonomous Semiconductor Failure Analysis Report Generation Neural 3D Reconstruction of Planetary Surfaces from Descent-Phase Wide-Angle Imagery Multitasking Embedding for Embryo Blastocyst Grading Prediction (MEmEBG) Towards Patient-Specific Deformable Registration in Laparoscopic Surgery
D2E-An Autonomous Decision-making Dataset involving Driver States and Human Evaluation
Zehong Ke, Yanbo Jiang, Yuning Wang, Hao Cheng, Jinhao Li, Jianq · 2024-04-13 · via cs.CV updates on arXiv.org

With the advancement of deep learning technology, data-driven methods are increasingly used in the decision-making of autonomous driving, and the quality of datasets greatly influenced the model performance. Although current datasets have made significant progress in the collection of vehicle and environment data, emphasis on human-end data including the driver states and human evaluation is not sufficient. In addition, existing datasets consist mostly of simple scenarios such as car following, resulting in low interaction levels. In this paper, we introduce the Driver to Evaluation dataset (D2E), an autonomous decision-making dataset that contains data on driver states, vehicle states, environmental situations, and evaluation scores from human reviewers, covering a comprehensive process of vehicle decision-making. Apart from regular agents and surrounding environment information, we not only collect driver factor data including first-person view videos, physiological signals, and eye attention data, but also provide subjective rating scores from 40 human volunteers. The dataset is mixed of driving simulator scenes and real-road ones. High-interaction situations are designed and filtered to ensure behavior diversity. Through data organization, analysis, and preprocessing, D2E contains over 1100 segments of interactive driving case data covering from human driver factor to evaluation results, supporting the development of data-driven decision-making related algorithms.