惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
博客园_首页
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
阮一峰的网络日志
阮一峰的网络日志
酷 壳 – CoolShell
酷 壳 – CoolShell
博客园 - 司徒正美
V
V2EX
Cloudbric
Cloudbric
Hugging Face - Blog
Hugging Face - Blog
腾讯CDC
量子位
博客园 - 三生石上(FineUI控件)
博客园 - 叶小钗
K
Kaspersky official blog
博客园 - 【当耐特】
T
Tenable Blog
L
Lohrmann on Cybersecurity
The Cloudflare Blog
S
Schneier on Security
A
Arctic Wolf
Latest news
Latest news
C
Cyber Attacks, Cyber Crime and Cyber Security
罗磊的独立博客
T
The Exploit Database - CXSecurity.com
Cisco Talos Blog
Cisco Talos Blog
小众软件
小众软件
P
Privacy & Cybersecurity Law Blog
WordPress大学
WordPress大学
Simon Willison's Weblog
Simon Willison's Weblog
雷峰网
雷峰网
NISL@THU
NISL@THU
人人都是产品经理
人人都是产品经理
月光博客
月光博客
J
Java Code Geeks
V
Visual Studio Blog
S
Security Affairs
博客园 - Franky
T
Tailwind CSS Blog
Apple Machine Learning Research
Apple Machine Learning Research
H
Heimdal Security Blog
有赞技术团队
有赞技术团队
V2EX - 技术
V2EX - 技术
AWS News Blog
AWS News Blog
G
GRAHAM CLULEY
T
Troy Hunt's Blog
SecWiki News
SecWiki News
Spread Privacy
Spread Privacy
宝玉的分享
宝玉的分享
www.infosecurity-magazine.com
www.infosecurity-magazine.com
博客园 - 聂微东

cs.RO updates on arXiv.org

FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation DENALI: A Dataset Enabling Non-Line-of-Sight Spatial Reasoning with Low-Cost LiDARs SENSE: Stereo OpEN Vocabulary SEmantic Segmentation Continual Hand-Eye Calibration for Open-world Robotic Manipulation PLAF: Pixel-wise Language-Aligned Feature Extraction for Efficient 3D Scene Understanding GaussianFlow SLAM: Monocular Gaussian Splatting SLAM Guided by GaussianFlow GIST: Multimodal Knowledge Extraction and Spatial Grounding via Intelligent Semantic Topology $π_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities R3D: Revisiting 3D Policy Learning Vision-Based Safe Human-Robot Collaboration with Uncertainty Guarantees Benchmarking Classical Coverage Path Planning Heuristics on Irregular Hexagonal Grids for Maritime Coverage Scenarios NEAT-NC: NEAT guided Navigation Cells for Robot Path Planning HRDexDB: A Large-Scale Dataset of Dexterous Human and Robotic Hand Grasps ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints An Intelligent Robotic and Bio-Digestor Framework for Smart Waste Management Efficient closed-form approaches for pose estimation using Sylvester forms World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems A Nonasymptotic Theory of Gain-Dependent Error Dynamics in Behavior Cloning CooperDrive: Enhancing Driving Decisions Through Cooperative Perception SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception Towards Multi-Object-Tracking with Radar on a Fast Moving Vehicle: On the Potential of Processing Radar in the Frequency Domain Beyond Conservative Automated Driving in Multi-Agent Scenarios via Coupled Model Predictive Control and Deep Reinforcement Learning Failure Identification in Imitation Learning Via Statistical and Semantic Filtering A Dynamic-Growing Fuzzy-Neuro Controller, Application to a 3PSP Parallel Robot Vision-Language-Action Jump-Starting for Reinforcement Learning Robotic Agents A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning RadarSplat-RIO: Indoor Radar-Inertial Odometry with Gaussian Splatting-Based Radar Bundle Adjustment RobotPan: A 360$^\circ$ Surround-View Robotic Vision System for Embodied Perception Diffusion Sequence Models for Generative In-Context Meta-Learning of Robot Dynamics GeoVision-Enabled Digital Twin for Hybrid Autonomous-Teleoperated Medical Responses 4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview Multi-modal panoramic 3D outdoor datasets for place categorization Learning Probabilistic Responsibility Allocations for Multi-Agent Interactions Solving Physics Olympiad via Reinforcement Learning on Physics Simulators StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems Grounded World Model for Semantically Generalizable Planning SCORP: Scene-Consistent Multi-agent Diffusion Planning with Stable Online Reinforcement Post-Training for Cooperative Driving Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems AffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech Minimal Embodiment Enables Efficient Learning of Number Concepts in Robot Learning to Forget -- Hierarchical Episodic Memory for Lifelong Robot Deployment 3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems Federated Single-Agent Robotics: Multi-Robot Coordination Without Intra-Robot Multi-Agent Fragmentation Robust Adversarial Policy Optimization Under Dynamics Uncertainty BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving AffordGen: Generating Diverse Demonstrations for Generalizable Object Manipulation with Afford Correspondence Genie 4D: Semantic-Prior-Guided 4D Dynamic Scene Reconstruction RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models PhysInOne: Visual Physics Learning and Reasoning in One Suite C$^2$T: Captioning-Structure and LLM-Aligned Common-Sense Reward Learning for Traffic--Vehicle Coordination WOMBET: World Model-Based Experience Transfer for Robust and Sample-efficient Reinforcement Learning Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding Action Images: End-to-End Policy Learning via Multiview Video Generation Towards Generalizable Robotic Manipulation in Dynamic Environments General-purpose LLMs as Models of Human Driver Behavior: The Case of Simplified Merging Uncertainty, Vagueness, and Ambiguity in Human-Robot Interaction: Why Conceptualization Matters IROSA: Interactive Robot Skill Adaptation using Natural Language Online Navigation Planning for Long-term Autonomous Operation of Underwater Gliders Optimized Human-Robot Co-Dispatch Planning for Petro-Site Surveillance under Varying Criticalities MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation From Instruction to Event: Sound-Triggered Mobile Manipulation Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODACER) for Safe Reinforcement Learning in Optimal Control Enhanced-FQL($λ$), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making Target-Bench: Can Video World Models Achieve Mapless Path Planning with Semantic Targets? Robust Verification of Controllers under State Uncertainty via Hamilton-Jacobi Reachability Analysis Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion Volumetric Ergodic Control RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research Multimodal Diffusion Forcing for Forceful Manipulation X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models Flow with the Force Field: Learning 3D Compliant Flow Matching Policies from Force and Demonstration-Guided Simulation Data AFFORD2ACT: Affordance-Guided Automatic Keypoint Selection for Generalizable and Lightweight Robotic Manipulation HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance Multi-Modal Manipulation via Multi-Modal Policy Consensus AutoDrive-R$^2$: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving Constrained Decoding for Safe Robot Navigation Foundation Models FCBV-Net: Category-Level Robotic Garment Smoothing via Feature-Conditioned Bimanual Value Prediction PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving LLM-based Realistic Safety-Critical Driving Video Generation Scalable Multi-Task Learning through Spiking Neural Networks with Adaptive Task-Switching Policy for Intelligent Autonomous Agents Learning to Play Piano in the Real World Scalable Unseen Objects 6-DoF Absolute Pose Estimation with Robotic Integration Sixth-Sense: Self-Supervised Learning of Spatial Awareness of Humans from a Planar Lidar Curriculum-based Sample Efficient Reinforcement Learning for Robust Stabilization of a Quadrotor Generative Models and Connected and Automated Vehicles: A Survey in Exploring the Intersection of Transportation and AI Convex Hulls of Reachable Sets
Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations
Lintong Zhang, Yifu Tao, Jiarong Lin, Fu Zhang, Maurice Fallon · 2024-08-22 · via cs.RO updates on arXiv.org

Recent advances in mapping techniques have enabled the creation of highly accurate dense 3D maps during robotic missions, such as point clouds, meshes, or NeRF-based representations. These developments present new opportunities for reusing these maps for localization. However, there remains a lack of a unified approach that can operate seamlessly across different map representations. This paper presents and evaluates a global visual localization system capable of localizing a single camera image across various 3D map representations built using both visual and lidar sensing. Our system generates a database by synthesizing novel views of the scene, creating RGB and depth image pairs. Leveraging the precise 3D geometric map, our method automatically defines rendering poses, reducing the number of database images while preserving retrieval performance. To bridge the domain gap between real query camera images and synthetic database images, our approach utilizes learning-based descriptors and feature detectors. We evaluate the system's performance through extensive real-world experiments conducted in both indoor and outdoor settings, assessing the effectiveness of each map representation and demonstrating its advantages over traditional structure-from-motion (SfM) localization approaches. The results show that all three map representations can achieve consistent localization success rates of 55% and higher across various environments. NeRF synthesized images show superior performance, localizing query images at an average success rate of 72%. Furthermore, we demonstrate an advantage over SfM-based approaches that our synthesized database enables localization in the reverse travel direction which is unseen during the mapping process. Our system, operating in real-time on a mobile laptop equipped with a GPU, achieves a processing rate of 1Hz.