惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Hacker News: Front Page
Apple Machine Learning Research
Apple Machine Learning Research
S
SegmentFault 最新的问题
U
Unit 42
博客园_首页
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
人人都是产品经理
人人都是产品经理
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
V
V2EX
爱范儿
爱范儿
阮一峰的网络日志
阮一峰的网络日志
美团技术团队
宝玉的分享
宝玉的分享
Hugging Face - Blog
Hugging Face - Blog
S
Schneier on Security
博客园 - 聂微东
T
Threat Research - Cisco Blogs
量子位
博客园 - 【当耐特】
博客园 - 叶小钗
T
Tenable Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
博客园 - Franky
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Security Archives - TechRepublic
Security Archives - TechRepublic
AWS News Blog
AWS News Blog
Know Your Adversary
Know Your Adversary
S
Security @ Cisco Blogs
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
有赞技术团队
有赞技术团队
T
Tailwind CSS Blog
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
IT之家
IT之家
博客园 - 三生石上(FineUI控件)
Spread Privacy
Spread Privacy
V
Visual Studio Blog
罗磊的独立博客
D
Darknet – Hacking Tools, Hacker News & Cyber Security
大猫的无限游戏
大猫的无限游戏
Hacker News - Newest:
Hacker News - Newest: "LLM"
V
Vulnerabilities – Threatpost
The Cloudflare Blog
小众软件
小众软件
WordPress大学
WordPress大学

cs.RO updates on arXiv.org

FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation DENALI: A Dataset Enabling Non-Line-of-Sight Spatial Reasoning with Low-Cost LiDARs SENSE: Stereo OpEN Vocabulary SEmantic Segmentation Continual Hand-Eye Calibration for Open-world Robotic Manipulation PLAF: Pixel-wise Language-Aligned Feature Extraction for Efficient 3D Scene Understanding GaussianFlow SLAM: Monocular Gaussian Splatting SLAM Guided by GaussianFlow GIST: Multimodal Knowledge Extraction and Spatial Grounding via Intelligent Semantic Topology $π_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities R3D: Revisiting 3D Policy Learning Vision-Based Safe Human-Robot Collaboration with Uncertainty Guarantees Benchmarking Classical Coverage Path Planning Heuristics on Irregular Hexagonal Grids for Maritime Coverage Scenarios NEAT-NC: NEAT guided Navigation Cells for Robot Path Planning HRDexDB: A Large-Scale Dataset of Dexterous Human and Robotic Hand Grasps ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints An Intelligent Robotic and Bio-Digestor Framework for Smart Waste Management Efficient closed-form approaches for pose estimation using Sylvester forms World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems A Nonasymptotic Theory of Gain-Dependent Error Dynamics in Behavior Cloning CooperDrive: Enhancing Driving Decisions Through Cooperative Perception SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception Towards Multi-Object-Tracking with Radar on a Fast Moving Vehicle: On the Potential of Processing Radar in the Frequency Domain Beyond Conservative Automated Driving in Multi-Agent Scenarios via Coupled Model Predictive Control and Deep Reinforcement Learning Failure Identification in Imitation Learning Via Statistical and Semantic Filtering A Dynamic-Growing Fuzzy-Neuro Controller, Application to a 3PSP Parallel Robot Vision-Language-Action Jump-Starting for Reinforcement Learning Robotic Agents A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning RadarSplat-RIO: Indoor Radar-Inertial Odometry with Gaussian Splatting-Based Radar Bundle Adjustment RobotPan: A 360$^\circ$ Surround-View Robotic Vision System for Embodied Perception Diffusion Sequence Models for Generative In-Context Meta-Learning of Robot Dynamics GeoVision-Enabled Digital Twin for Hybrid Autonomous-Teleoperated Medical Responses 4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview Multi-modal panoramic 3D outdoor datasets for place categorization Learning Probabilistic Responsibility Allocations for Multi-Agent Interactions Solving Physics Olympiad via Reinforcement Learning on Physics Simulators StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems Grounded World Model for Semantically Generalizable Planning SCORP: Scene-Consistent Multi-agent Diffusion Planning with Stable Online Reinforcement Post-Training for Cooperative Driving Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems AffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech Minimal Embodiment Enables Efficient Learning of Number Concepts in Robot Learning to Forget -- Hierarchical Episodic Memory for Lifelong Robot Deployment 3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems Federated Single-Agent Robotics: Multi-Robot Coordination Without Intra-Robot Multi-Agent Fragmentation Robust Adversarial Policy Optimization Under Dynamics Uncertainty BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving AffordGen: Generating Diverse Demonstrations for Generalizable Object Manipulation with Afford Correspondence Genie 4D: Semantic-Prior-Guided 4D Dynamic Scene Reconstruction RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models PhysInOne: Visual Physics Learning and Reasoning in One Suite C$^2$T: Captioning-Structure and LLM-Aligned Common-Sense Reward Learning for Traffic--Vehicle Coordination WOMBET: World Model-Based Experience Transfer for Robust and Sample-efficient Reinforcement Learning Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding Action Images: End-to-End Policy Learning via Multiview Video Generation Towards Generalizable Robotic Manipulation in Dynamic Environments General-purpose LLMs as Models of Human Driver Behavior: The Case of Simplified Merging Uncertainty, Vagueness, and Ambiguity in Human-Robot Interaction: Why Conceptualization Matters IROSA: Interactive Robot Skill Adaptation using Natural Language Online Navigation Planning for Long-term Autonomous Operation of Underwater Gliders Optimized Human-Robot Co-Dispatch Planning for Petro-Site Surveillance under Varying Criticalities MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation From Instruction to Event: Sound-Triggered Mobile Manipulation Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODACER) for Safe Reinforcement Learning in Optimal Control Enhanced-FQL($λ$), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making Target-Bench: Can Video World Models Achieve Mapless Path Planning with Semantic Targets? Robust Verification of Controllers under State Uncertainty via Hamilton-Jacobi Reachability Analysis Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion Volumetric Ergodic Control RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research Multimodal Diffusion Forcing for Forceful Manipulation X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models Flow with the Force Field: Learning 3D Compliant Flow Matching Policies from Force and Demonstration-Guided Simulation Data AFFORD2ACT: Affordance-Guided Automatic Keypoint Selection for Generalizable and Lightweight Robotic Manipulation HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance Multi-Modal Manipulation via Multi-Modal Policy Consensus AutoDrive-R$^2$: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving Constrained Decoding for Safe Robot Navigation Foundation Models FCBV-Net: Category-Level Robotic Garment Smoothing via Feature-Conditioned Bimanual Value Prediction PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving LLM-based Realistic Safety-Critical Driving Video Generation Scalable Multi-Task Learning through Spiking Neural Networks with Adaptive Task-Switching Policy for Intelligent Autonomous Agents Learning to Play Piano in the Real World Scalable Unseen Objects 6-DoF Absolute Pose Estimation with Robotic Integration Sixth-Sense: Self-Supervised Learning of Spatial Awareness of Humans from a Planar Lidar Curriculum-based Sample Efficient Reinforcement Learning for Robust Stabilization of a Quadrotor Generative Models and Connected and Automated Vehicles: A Survey in Exploring the Intersection of Transportation and AI Convex Hulls of Reachable Sets
CommandSwarm: Safety-Aware Natural Language-to-Behavior-Tree Generation for Robotic Swarms
Mohammed Majid, Amjad Yousef Majid · 2026-05-08 · via cs.RO updates on arXiv.org

Natural-language interfaces can make swarm robotics more accessible to non-expert operators, but they must translate ambiguous user intent into executable swarm behaviors without unsupported actions, malformed programs, or unsafe plans. This paper presents CommandSwarm, a safety-aware language-to-behavior-tree pipeline for generating XML behavior trees (BTs) from speech or text commands. The system combines multilingual translation, command-level safety filtering, constrained prompting, a LoRA-adapted large language model (LLM), and deterministic parser validation against a whitelist of executable swarm primitives. We evaluate eleven open 6.7B--14B parameter LLMs, all using 4-bit quantization, on representative swarm-control scenarios under zero-shot, one-shot, and two-shot prompting. Falcon3-Instruct-10B and Mistral-7B-v3 are the strongest prompt-engineered candidates, reaching BLEU scores above 0.60 and high syntactic validity in few-shot settings. LoRA adaptation of Falcon3-Instruct-10B on a 2,063-example synthetic instruction--BT corpus improves zero-shot BLEU from 0.267 to 0.663, ROUGE-L from 0.366 to 0.692, and parser-accepted syntactic validity from 0% to 72%. Translation experiments further show that SeamlessM4T v2-large and EuroLLM-9B provide the best quality-latency trade-offs for the multilingual front end. The results indicate that compact, quantized, domain-adapted LLMs can generate useful swarm BTs when embedded in a validated systems pipeline. They also show that parser acceptance and safety filtering remain necessary execution gates; generation quality alone is not sufficient for autonomous deployment.