惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

美团技术团队
T
Troy Hunt's Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
S
Schneier on Security
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
NISL@THU
NISL@THU
The Hacker News
The Hacker News
Know Your Adversary
Know Your Adversary
L
Lohrmann on Cybersecurity
SecWiki News
SecWiki News
S
Security Affairs
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Help Net Security
Help Net Security
L
LINUX DO - 热门话题
Application and Cybersecurity Blog
Application and Cybersecurity Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
I
Intezer
S
Secure Thoughts
罗磊的独立博客
Attack and Defense Labs
Attack and Defense Labs
G
GRAHAM CLULEY
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
博客园_首页
Cyberwarzone
Cyberwarzone
IT之家
IT之家
T
Threatpost
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The Cloudflare Blog
博客园 - 叶小钗
Cloudbric
Cloudbric
量子位
Scott Helme
Scott Helme
N
News | PayPal Newsroom
L
LINUX DO - 最新话题
O
OpenAI News
C
Cyber Attacks, Cyber Crime and Cyber Security
Security Archives - TechRepublic
Security Archives - TechRepublic
C
Cybersecurity and Infrastructure Security Agency CISA
J
Java Code Geeks
有赞技术团队
有赞技术团队
月光博客
月光博客
大猫的无限游戏
大猫的无限游戏
W
WeLiveSecurity
宝玉的分享
宝玉的分享
P
Privacy International News Feed
A
Arctic Wolf
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
云风的 BLOG
云风的 BLOG

cs.RO updates on arXiv.org

FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation DENALI: A Dataset Enabling Non-Line-of-Sight Spatial Reasoning with Low-Cost LiDARs SENSE: Stereo OpEN Vocabulary SEmantic Segmentation Continual Hand-Eye Calibration for Open-world Robotic Manipulation PLAF: Pixel-wise Language-Aligned Feature Extraction for Efficient 3D Scene Understanding GaussianFlow SLAM: Monocular Gaussian Splatting SLAM Guided by GaussianFlow GIST: Multimodal Knowledge Extraction and Spatial Grounding via Intelligent Semantic Topology $π_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities R3D: Revisiting 3D Policy Learning Vision-Based Safe Human-Robot Collaboration with Uncertainty Guarantees Benchmarking Classical Coverage Path Planning Heuristics on Irregular Hexagonal Grids for Maritime Coverage Scenarios NEAT-NC: NEAT guided Navigation Cells for Robot Path Planning HRDexDB: A Large-Scale Dataset of Dexterous Human and Robotic Hand Grasps ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints An Intelligent Robotic and Bio-Digestor Framework for Smart Waste Management Efficient closed-form approaches for pose estimation using Sylvester forms World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems A Nonasymptotic Theory of Gain-Dependent Error Dynamics in Behavior Cloning CooperDrive: Enhancing Driving Decisions Through Cooperative Perception SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception Towards Multi-Object-Tracking with Radar on a Fast Moving Vehicle: On the Potential of Processing Radar in the Frequency Domain Beyond Conservative Automated Driving in Multi-Agent Scenarios via Coupled Model Predictive Control and Deep Reinforcement Learning Failure Identification in Imitation Learning Via Statistical and Semantic Filtering A Dynamic-Growing Fuzzy-Neuro Controller, Application to a 3PSP Parallel Robot Vision-Language-Action Jump-Starting for Reinforcement Learning Robotic Agents A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning RadarSplat-RIO: Indoor Radar-Inertial Odometry with Gaussian Splatting-Based Radar Bundle Adjustment RobotPan: A 360$^\circ$ Surround-View Robotic Vision System for Embodied Perception Diffusion Sequence Models for Generative In-Context Meta-Learning of Robot Dynamics GeoVision-Enabled Digital Twin for Hybrid Autonomous-Teleoperated Medical Responses 4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview Multi-modal panoramic 3D outdoor datasets for place categorization Learning Probabilistic Responsibility Allocations for Multi-Agent Interactions Solving Physics Olympiad via Reinforcement Learning on Physics Simulators StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems Grounded World Model for Semantically Generalizable Planning SCORP: Scene-Consistent Multi-agent Diffusion Planning with Stable Online Reinforcement Post-Training for Cooperative Driving Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems AffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech Minimal Embodiment Enables Efficient Learning of Number Concepts in Robot Learning to Forget -- Hierarchical Episodic Memory for Lifelong Robot Deployment 3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems Federated Single-Agent Robotics: Multi-Robot Coordination Without Intra-Robot Multi-Agent Fragmentation Robust Adversarial Policy Optimization Under Dynamics Uncertainty BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving AffordGen: Generating Diverse Demonstrations for Generalizable Object Manipulation with Afford Correspondence Genie 4D: Semantic-Prior-Guided 4D Dynamic Scene Reconstruction RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models PhysInOne: Visual Physics Learning and Reasoning in One Suite C$^2$T: Captioning-Structure and LLM-Aligned Common-Sense Reward Learning for Traffic--Vehicle Coordination WOMBET: World Model-Based Experience Transfer for Robust and Sample-efficient Reinforcement Learning Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding Action Images: End-to-End Policy Learning via Multiview Video Generation Towards Generalizable Robotic Manipulation in Dynamic Environments General-purpose LLMs as Models of Human Driver Behavior: The Case of Simplified Merging Uncertainty, Vagueness, and Ambiguity in Human-Robot Interaction: Why Conceptualization Matters IROSA: Interactive Robot Skill Adaptation using Natural Language Online Navigation Planning for Long-term Autonomous Operation of Underwater Gliders Optimized Human-Robot Co-Dispatch Planning for Petro-Site Surveillance under Varying Criticalities MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation From Instruction to Event: Sound-Triggered Mobile Manipulation Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODACER) for Safe Reinforcement Learning in Optimal Control Enhanced-FQL($λ$), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making Target-Bench: Can Video World Models Achieve Mapless Path Planning with Semantic Targets? Robust Verification of Controllers under State Uncertainty via Hamilton-Jacobi Reachability Analysis Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion Volumetric Ergodic Control RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research Multimodal Diffusion Forcing for Forceful Manipulation X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models Flow with the Force Field: Learning 3D Compliant Flow Matching Policies from Force and Demonstration-Guided Simulation Data AFFORD2ACT: Affordance-Guided Automatic Keypoint Selection for Generalizable and Lightweight Robotic Manipulation HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance Multi-Modal Manipulation via Multi-Modal Policy Consensus AutoDrive-R$^2$: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving Constrained Decoding for Safe Robot Navigation Foundation Models FCBV-Net: Category-Level Robotic Garment Smoothing via Feature-Conditioned Bimanual Value Prediction PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving LLM-based Realistic Safety-Critical Driving Video Generation Scalable Multi-Task Learning through Spiking Neural Networks with Adaptive Task-Switching Policy for Intelligent Autonomous Agents Learning to Play Piano in the Real World Scalable Unseen Objects 6-DoF Absolute Pose Estimation with Robotic Integration Sixth-Sense: Self-Supervised Learning of Spatial Awareness of Humans from a Planar Lidar Curriculum-based Sample Efficient Reinforcement Learning for Robust Stabilization of a Quadrotor Generative Models and Connected and Automated Vehicles: A Survey in Exploring the Intersection of Transportation and AI Convex Hulls of Reachable Sets
STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation
Haokun Zhu, Zongtai Li, Zhixuan Liu, Wenshan Wang, Ji Zhang, Jon · 2025-05-11 · via cs.RO updates on arXiv.org

Vision-Language Models (VLMs) have been increasingly integrated into object navigation tasks for their rich prior knowledge and strong reasoning abilities. However, applying VLMs to navigation poses two key challenges: effectively representing complex environment information and determining \textit{when and how} to query VLMs. Insufficient environment understanding and over-reliance on VLMs (e.g. querying at every step) can lead to unnecessary backtracking and reduced navigation efficiency, especially in continuous environments. To address these challenges, we propose a novel framework that constructs a multi-layer representation of the environment during navigation. This representation consists of viewpoint, object nodes, and room nodes. Viewpoints and object nodes facilitate intra-room exploration and accurate target localization, while room nodes support efficient inter-room planning. Building on this representation, we propose a novel two-stage navigation policy, integrating high-level planning guided by VLM reasoning with low-level VLM-assisted exploration to efficiently locate a goal object. We evaluated our approach on three simulated benchmarks (HM3D, RoboTHOR, and MP3D), and achieved state-of-the-art performance on both the success rate ($\mathord{\uparrow}\, 7.1\%$) and navigation efficiency ($\mathord{\uparrow}\, 12.5\%$). We further validate our method on a real robot platform, demonstrating strong robustness across 15 object navigation tasks in 10 different indoor environments. Project page is available at https://zwandering.github.io/STRIVE.github.io/ .