惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
Tailwind CSS Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
S
SegmentFault 最新的问题
U
Unit 42
C
Cyber Attacks, Cyber Crime and Cyber Security
Security Latest
Security Latest
L
LINUX DO - 最新话题
The Register - Security
The Register - Security
人人都是产品经理
人人都是产品经理
美团技术团队
PCI Perspectives
PCI Perspectives
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
W
WeLiveSecurity
F
Full Disclosure
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Cloudbric
Cloudbric
L
LangChain Blog
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
M
MIT News - Artificial intelligence
S
Security @ Cisco Blogs
博客园 - 【当耐特】
Webroot Blog
Webroot Blog
Stack Overflow Blog
Stack Overflow Blog
C
Check Point Blog
Help Net Security
Help Net Security
NISL@THU
NISL@THU
WordPress大学
WordPress大学
Simon Willison's Weblog
Simon Willison's Weblog
月光博客
月光博客
C
CERT Recently Published Vulnerability Notes
博客园 - 三生石上(FineUI控件)
S
Securelist
博客园 - Franky
博客园 - 叶小钗
AWS News Blog
AWS News Blog
D
DataBreaches.Net
P
Proofpoint News Feed
小众软件
小众软件
C
Cybersecurity and Infrastructure Security Agency CISA
Hugging Face - Blog
Hugging Face - Blog
Engineering at Meta
Engineering at Meta
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
H
Hackread – Cybersecurity News, Data Breaches, AI and More
The GitHub Blog
The GitHub Blog
K
Kaspersky official blog
Vercel News
Vercel News
Google Online Security Blog
Google Online Security Blog
C
Cisco Blogs
S
Security Affairs

cs.RO updates on arXiv.org

ARETE: Attention-based Rasterized Encoding for Topology Estimation using HSV-transformed Crowdsourced Vehicle Fleet Data Learning to Identify Out-of-Distribution Objects for 3D LiDAR Anomaly Segmentation Keypoint-based Dynamic Object 6-DoF Pose Tracking via Event Camera GenAssets: Generating in-the-wild 3D Assets in Latent Space Efficient Image Annotation via Semi-Supervised Object Segmentation with Label Propagation QDTraj: Exploration of Diverse Trajectory Primitives for Articulated Objects Robotic Manipulation Decoding High-Dimensional Finger Motion from EMG Using Riemannian Features and RNNs LeHome: A Simulation Environment for Deformable Object Manipulation in Household Scenarios Learning-augmented robotic automation for real-world manufacturing An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments PAGaS: Pixel-Aligned 1DoF Gaussian Splatting for Depth Refinement Wiggle and Go! System Identification for Zero-Shot Dynamic Rope Manipulation Model Predictive Control of Hybrid Dynamical Systems Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation MISTY: High-Throughput Motion Planning via Mixer-based Single-step Drifting Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors How VLAs (Really) Work In Open-World Environments Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems Neuro-Symbolic Manipulation Understanding with Enriched Semantic Event Chains A Systematic Review and Taxonomy of Reinforcement Learning-Model Predictive Control Integration for Linear Systems Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics AAC: Admissible-by-Architecture Differentiable Landmark Compression for ALT Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning Temporal Difference Calibration in Sequential Tasks: Application to Vision-Language-Action Models MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation VTouch++: A Multimodal Dataset with Vision-Based Tactile Enhancement for Bimanual Manipulation SpaCeFormer: Fast Proposal-Free Open-Vocabulary 3D Instance Segmentation Distributional Value Estimation Without Target Networks for Robust Quality-Diversity Benefits of Low-Cost Bio-Inspiration in the Age of Overparametrization Bimanual Robot Manipulation via Multi-Agent In-Context Learning A Vision-Language-Action Model for Adaptive Ultrasound-Guided Needle Insertion and Needle Tracking EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving Cortex 2.0: Grounding World Models in Real-World Industrial Deployment From Scene to Object: Text-Guided Dual-Gaze Prediction Toward Safe Autonomous Robotic Endovascular Interventions using World Models Planetary Exploration 3.0: A Roadmap for Software-Defined, Radically Adaptive Space Systems DistortBench: Benchmarking Vision Language Models on Image Distortion Identification UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling VLA Foundry: A Unified Framework for Training Vision-Language-Action Models Learning Hybrid-Control Policies for High-Precision In-Contact Manipulation Under Uncertainty Multi-Cycle Spatio-Temporal Adaptation in Human-Robot Teaming SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models M$^{2}$GRPO: Mamba-based Multi-Agent Group Relative Policy Optimization for Biomimetic Underwater Robots Pursuit Mind2Drive: Predicting Driver Intentions from EEG in Real-world On-Road Driving Reinforcement Learning Enabled Adaptive Multi-Task Control for Bipedal Soccer Robots Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation Accelerating trajectory optimization with Sobolev-trained diffusion policies AI-Enabled Image-Based Hybrid Vision/Force Control of Tendon-Driven Aerial Continuum Manipulators Localization-Guided Foreground Augmentation in Autonomous Driving Gated Memory Policy Feasibility of Indoor Frame-Wise Lidar Semantic Segmentation via Distillation from Visual Foundation Model Vision-Based Human Awareness Estimation for Enhanced Safety and Efficiency of AMRs in Industrial Warehouses Spectral Kernel Dynamics for Planetary Surface Graphs: Distinction Dynamics and Topological Conservation FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation DENALI: A Dataset Enabling Non-Line-of-Sight Spatial Reasoning with Low-Cost LiDARs SENSE: Stereo OpEN Vocabulary SEmantic Segmentation Continual Hand-Eye Calibration for Open-world Robotic Manipulation From Seeing to Simulating: Generative High-Fidelity Simulation with Digital Cousins for Generalizable Robot Learning and Evaluation PLAF: Pixel-wise Language-Aligned Feature Extraction for Efficient 3D Scene Understanding GaussianFlow SLAM: Monocular Gaussian Splatting SLAM Guided by GaussianFlow GIST: Multimodal Knowledge Extraction and Spatial Grounding via Intelligent Semantic Topology $π_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities R3D: Revisiting 3D Policy Learning Vision-Based Safe Human-Robot Collaboration with Uncertainty Guarantees Benchmarking Classical Coverage Path Planning Heuristics on Irregular Hexagonal Grids for Maritime Coverage Scenarios NEAT-NC: NEAT guided Navigation Cells for Robot Path Planning HRDexDB: A Large-Scale Dataset of Dexterous Human and Robotic Hand Grasps ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints An Intelligent Robotic and Bio-Digestor Framework for Smart Waste Management Efficient closed-form approaches for pose estimation using Sylvester forms World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems A Nonasymptotic Theory of Gain-Dependent Error Dynamics in Behavior Cloning CooperDrive: Enhancing Driving Decisions Through Cooperative Perception SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception Towards Multi-Object-Tracking with Radar on a Fast Moving Vehicle: On the Potential of Processing Radar in the Frequency Domain Beyond Conservative Automated Driving in Multi-Agent Scenarios via Coupled Model Predictive Control and Deep Reinforcement Learning Failure Identification in Imitation Learning Via Statistical and Semantic Filtering A Dynamic-Growing Fuzzy-Neuro Controller, Application to a 3PSP Parallel Robot Vision-Language-Action Jump-Starting for Reinforcement Learning Robotic Agents A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning RadarSplat-RIO: Indoor Radar-Inertial Odometry with Gaussian Splatting-Based Radar Bundle Adjustment RobotPan: A 360$^\circ$ Surround-View Robotic Vision System for Embodied Perception Diffusion Sequence Models for Generative In-Context Meta-Learning of Robot Dynamics GeoVision-Enabled Digital Twin for Hybrid Autonomous-Teleoperated Medical Responses 4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview Multi-modal panoramic 3D outdoor datasets for place categorization Learning Probabilistic Responsibility Allocations for Multi-Agent Interactions Solving Physics Olympiad via Reinforcement Learning on Physics Simulators StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems Grounded World Model for Semantically Generalizable Planning SCORP: Scene-Consistent Multi-agent Diffusion Planning with Stable Online Reinforcement Post-Training for Cooperative Driving
Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Jie Yin, Andrew Luo, Yilun Du, Anoop Cherian, Tim K. Marks, Jona · 2024-07-16 · via cs.RO updates on arXiv.org

We study the problem of multimodal physical scene understanding, where an embodied agent needs to find fallen objects by inferring object properties, direction, and distance of an impact sound source. Previous works adopt feed-forward neural networks to directly regress the variables from sound, leading to poor generalization and domain adaptation issues. In this paper, we illustrate that learning a disentangled model of acoustic formation, referred to as disentangled acoustic field (DAF), to capture the sound generation and propagation process, enables the embodied agent to construct a spatial uncertainty map over where the objects may have fallen. We demonstrate that our analysis-by-synthesis framework can jointly infer sound properties by explicitly decomposing and factorizing the latent space of the disentangled model. We further show that the spatial uncertainty map can significantly improve the success rate for the localization of fallen objects by proposing multiple plausible exploration locations.