LeRobot v0.5.0: Scaling Every Dimension

Hugging Face - Blog

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs ALTK‑Evolve: On‑the‑Job Learning for AI Agents Safetensors is Joining the PyTorch Foundation Holo3: Breaking the Computer Use Frontier Any Custom Frontend with Gradio's Backend A New Framework for Evaluating Voice Agents (EVA) Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations One-Shot Any Web App with Gradio's gr.HTML CUGA on Hugging Face: Democratizing Configurable AI Agents New in llama.cpp: Model Management Building Deep Research: How we Achieved State of the Art OVHcloud on Hugging Face Inference Providers 🔥 20x Faster TRL Fine-tuning with RapidFire AI Building for an Open Future - our new partnership with Google Cloud Aligning to What? Rethinking Agent Generalization in MiniMax M2 Building a Healthcare Robot from Simulation to Deployment with NVIDIA Isaac Sentence Transformers is joining Hugging Face! Unlock the power of images with AI Sheets Supercharge your OCR Pipelines with Open Models Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face Get your VLM running in 3 simple steps on Intel CPUs Nemotron-Personas-India: Synthesized Data for Sovereign AI Introducing RTEB: A New Standard for Retrieval Evaluation Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models VibeGame: Exploring Vibe Coding Games Nemotron-Personas-Japan: ソブリン AI のための合成データセット Swift Transformers Reaches 1.0 – and Looks to the Future Smol2Operator: Post-Training GUI Agents for Computer Use SyGra: The One-Stop Framework for Building Data for LLMs and SLMs Gaia2 and ARE: Empowering the community to study agents Scaleway on Hugging Face Inference Providers 🔥 Democratizing AI Safety with RiskRubric.ai Public AI on Hugging Face Inference Providers 🔥 `LeRobotDataset:v3.0`: Bringing large-scale datasets to `lerobot` Visible Watermarking with Gradio Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason! Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers Fine-tune Any LLM from the Hugging Face Hub with Together AI Jupyter Agents: training LLMs to reason with notebooks mmBERT: ModernBERT goes Multilingual Welcome EmbeddingGemma, Google's new efficient embedding model SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence Make your ZeroGPU Spaces go brrr with ahead-of-time compilation NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset Generate Images with Claude and Hugging Face From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels MCP for Research: How to Connect AI to Research Tools Kimina-Prover-RL Arm & ExecuTorch 0.7: Bringing Generative AI to the masses Neural Super Sampling is here! TextQuests: How Good are LLMs at Text-Based Video Games? 🇵🇭 FilBench - Can LLMs Understand and Generate Filipino? Introducing AI Sheets: a tool to work with datasets using open AI models! Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training Vision Language Model Alignment in TRL ⚡️ Welcome GPT OSS, the new open-source model family from OpenAI! Measuring Open-Source Llama Nemotron Models on DeepResearch Bench 📚 3LM: A Benchmark for Arabic LLMs in STEM and Code Implementing MCP Servers in Python: An AI Shopping Assistant with Gradio Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨ Parquet Content-Defined Chunking TimeScope: How Long Can Your Video Large Multimodal Model Go? Fast LoRA inference for Flux with Diffusers and PEFT Accelerate a World of LLMs on Hugging Face with NVIDIA NIM Arc Virtual Cell Challenge: A Primer Consilium: When Multiple LLMs Collaborate Back to The Future: Evaluating AI Agents on Predicting Future Events Five Big Improvements to Gradio MCP Servers Ettin Suite: SoTA Paired Encoders and Decoders Migrating the Hub from Git LFS to Xet Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models Asynchronous Robot Inference: Decoupling Action Prediction and Execution ScreenEnv: Deploy your full stack Desktop Agent Building the Hugging Face MCP Server Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Creating custom kernels for the AMD MI300 Upskill your LLMs With Gradio MCP Servers SmolLM3: smol, multilingual, long-context reasoner Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure Efficient MultiModal Data Pipeline Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models Training and Finetuning Sparse Embedding Models with Sentence Transformers Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub Gemma 3n fully available in the open-source ecosystem! Transformers backend integration in SGLang (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware Groq on Hugging Face Inference Providers 🔥 How Long Prompts Block Other Requests - Optimizing LLM Performance Learn the Hugging Face Kernel Hub in 5 Minutes Convert Transformers to ONNX with Hugging Face Optimum Intel and Hugging Face Partner to Democratize Machine Learning Hardware Acceleration Director of Machine Learning Insights [Part 3: Finance Edition] The Annotated Diffusion Model Deep Q-Learning with Space Invaders Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers Introducing Pull Requests and Discussions 🥳 Efficient Table Pre-training without Real Data: An Introduction to TAPEX An Introduction to Q-Learning Part 2/2 How Sempre Health is leveraging the Expert Acceleration Program to accelerate their ML roadmap

Steven Palma, Pepijn Kooijmans, Jade Choghari, Caroline Pascal, · 2026-03-09 · via Hugging Face - Blog

Back to Articles

With over 200 merged PRs and over 50 new contributors since v0.4.0, LeRobot v0.5.0 is our biggest release yet — expanding in every direction at once. More robots (including our first humanoid), more policies (including the comeback of autoregressive VLAs), faster datasets, simulation environments you can load straight from the Hub, and a modernized codebase running on Python 3.12 and Transformers v5. Whether you're training policies in simulation or deploying them on real hardware, v0.5.0 has something for you.

TL;DR

LeRobot v0.5.0 adds full Unitree G1 humanoid support (whole-body control models), new policies –including Pi0-FAST autoregressive VLAs and Real-Time Chunking for responsive inference–, and streaming video encoding that eliminates wait times between recording episodes. The release also introduces EnvHub for loading simulation environments from the Hugging Face Hub, NVIDIA IsaacLab-Arena integration, and a major codebase modernization with Python 3.12+, Transformers v5, and third-party policy plugins.

LeRobot v0.5.0: Scaling Every Dimension

Hardware: More Robots Than Ever

LeRobot v0.5.0 dramatically expands the roster of supported hardware — from arms and mobile robots to a full humanoid.

Unitree G1 Humanoid

The biggest hardware addition in this release: full Unitree G1 humanoid support. This is LeRobot's first humanoid integration, and it's comprehensive:

Locomotion: Walk, navigate, and move through environments.
Manipulation: Perform dexterous object manipulation tasks.
Teleoperation: Control the G1 remotely with an intuitive teleoperation interface.
Whole-Body Control (WBC): Coordinate locomotion and manipulation simultaneously for complex, real-world tasks.

The G1 integration represents a major step toward general-purpose robotics within LeRobot — moving beyond tabletop arms into full-body embodied AI. Try it out yourself by following the documentation.

OpenArm & OpenArm Mini

We've added support for the OpenArm robot and its companion OpenArm Mini teleoperator. OpenArm is a capable robot arm with full LeRobot integration, and the Mini serves as its natural teleoperation device. Both support bi-manual configurations, enabling dual-arm setups for more complex manipulation tasks. Check it out in the documentation.

More Robots

The hardware ecosystem keeps growing:

Earth Rover: Our first mobile robot integration, bringing LeRobot to outdoor navigation and ground-level robotics.
OMX Robot: A new robot arm with configurable gripper settings and calibration support.
SO-100/SO-101 Consolidation: We've unified the SO-100 and SO-101 implementations into a single, cleaner codebase — including bi-manual setups. Less code duplication, easier maintenance, same great robots.

CAN Bus Motors

New motor controller support via CAN (Controller Area Network) bus opens the door to higher-performance actuators:

RobStride: A CAN-based motor controller for high-torque applications.
Damiao: Another CAN bus motor controller, expanding the range of compatible hardware.

These additions mean LeRobot can now drive a wider variety of professional-grade actuators beyond the existing Dynamixel and Feetech ecosystem.

Policies: A Growing Model Zoo

This release brings six new policies and techniques into LeRobot, pushing the boundaries of what's possible with open-source robot learning.

Pi0-FAST: Autoregressive VLAs

Pi0-FAST brings autoregressive Vision-Language-Action models to LeRobot with FAST (Frequency-space Action Sequence Tokenization). Unlike the flow-matching approach of Pi0, Pi0-FAST uses an autoregressive action expert (based on Gemma 300M) that generates discretized action tokens, enabling:

FAST tokenization: Actions are tokenized for autoregressive decoding, with a dedicated FAST action tokenizer.
Flexible decoding: Configurable temperature and max decoding steps for balancing speed and quality.
RTC-compatible: Works with Real-Time Chunking (see next section) for responsive inference.

lerobot-train \
  --policy.type=pi0_fast \
  --dataset.repo_id=lerobot/aloha_sim_insertion_human \
  --policy.device=cuda

Real-Time Chunking (RTC)

Real-Time Chunking is an inference-time technique from Physical Intelligence that makes flow-matching policies dramatically more responsive. Instead of waiting for a full action chunk to finish before replanning, RTC continuously blends new predictions with in-progress actions, producing smoother and more reactive behavior.

RTC is not a standalone policy — it's an enhancement that plugs into existing flow-matching policies (Pi0 family, SmolVLA & Diffusion). Configure it via --policy.rtc_config.enabled=true.

This is a game-changer for real-world deployment where latency matters. Read the original paper for the technical details and our documentation.

Wall-X

Wall-X is a new VLA policy built on Qwen2.5-VL with flow-matching action prediction. It combines the strong vision-language understanding of Qwen2.5-VL with a flow-matching head for cross-embodiment robotic control.

pip install lerobot[wall_x]
lerobot-train \
  --policy.type=wall_x \
  --dataset.repo_id=lerobot/aloha_sim_insertion_human

X-VLA

X-VLA brings a Florence2-based VLA to LeRobot. Built on Microsoft's Florence-2 vision-language model, X-VLA offers an alternative backbone for VLA policies, expanding the diversity of foundation models available for robot learning. Check out the training guide for setup instructions and the base model.

pip install lerobot[xvla]
lerobot-train \
  --policy.type=xvla \
  --dataset.repo_id=lerobot/bimanual-so100-handover-cube

SARM

SARM (Stage-Aware Reward Modeling) tackles one of the hardest problems in robot learning: long-horizon tasks. Instead of using a single global linear progress signal over the whole episode, it models progress in a stage-aware manner by predicting both the task stage and the progress within that stage. This makes it much easier to train policies for complex, multi-step manipulation tasks. Start experimenting with it by following the documentation.

PEFT Support

You can now fine-tune large VLAs using LoRA (and other PEFT methods) without modifying the core training pipeline. PEFT configuration lives at the policy level, making it straightforward to adapt massive foundation models to your specific robot and task with a fraction of the compute. Learn more reading the documentation.

lerobot-train \
  --policy.type=pi0 \
  --policy.peft_config.use_peft=true \
  --dataset.repo_id=lerobot/aloha_sim_insertion_human

Datasets: Faster Recording, Faster Training

The dataset pipeline gets major performance improvements in this release, making both data collection and training significantly faster.

Streaming Video Encoding

Previously, recording a dataset meant waiting after each episode for video encoding to finish. No more. With streaming video encoding, frames are encoded in real-time as they're captured — meaning zero wait time between episodes. Just finish one episode and immediately start the next.

Streaming encoding also supports hardware encoder auto-detection, so if your system has a GPU-accelerated video encoder, LeRobot will use it automatically:

dataset = LeRobotDataset.create(
    repo_id="my/dataset",
    fps=30,
    video_backend="auto",       # Auto-detect best HW encoder
    streaming_encoding=True,    # Encode in real-time
)

Streaming encoding performance can vary significantly depending on your hardware and recording setup (number of cameras, resolution, etc.). Make sure to review the streaming video encoding documentation before enabling it.

10x Faster Image Training, 3x Faster Encoding

Under the hood, we've fixed key data access bottlenecks and overhauled image processing:

10x faster image training: Improved image transform support and fixed data access bottlenecks that were silently slowing down training.
3x faster encoding: Parallel encoding is now the default across all platforms, with dynamic compression levels that adapt to your dataset type (video vs. image), when not using streaming.
Better CPU utilization: More efficient resource usage during recording and dataset creation.

New Dataset Tools

The dataset editing toolkit continues to grow:

Subtask support: Annotate and query subtasks within episodes for hierarchical task learning.
Image-to-video conversion: Convert existing image-based datasets to video format for better storage efficiency, with support for multiple episodes per video file.
More editing operations: New info operation for inspecting datasets, task modification tools, and numerous fixes to existing operations (splitting, merging, feature editing).
Expose more options: Configurable video codecs, tolerance settings, and metadata buffer sizes for fine-grained control over dataset creation.

EnvHub: Environments from the Hub

EnvHub is a new way to use simulation environments in LeRobot: load them directly from the Hugging Face Hub. Instead of installing environment packages locally and wiring up registration, you can now point LeRobot at a Hub repository and it handles everything — downloading the environment code, registering it with Gymnasium, and making it available for training and evaluation.

Hub environments use HubEnvConfig, which downloads and executes remote make_env functions:

lerobot-train \
  --env.type=hub \
  --env.hub_path="username/my-custom-env" \
  --policy.type=act

This lowers the barrier for sharing custom simulation environments with the community. Package your environment, push it to the Hub, and anyone can train on it. Check out the documentation to learn more. Here's an example to get started: LeIsaac x LeRobot EnvHub tutorial.

NVIDIA IsaacLab-Arena

We've integrated NVIDIA IsaacLab-Arena, bringing GPU-accelerated simulation to LeRobot. IsaacLab-Arena provides a collection of manipulation tasks running on NVIDIA's Isaac Sim, offering massively parallel environment instances for fast reinforcement learning. The integration includes dedicated pre/post-processing steps and full compatibility with LeRobot's training pipeline. Check out the documentation.

Codebase: A Modern Foundation

This release modernizes the codebase:

Python 3.12+: LeRobot now requires Python 3.12 as the minimum version, enabling modern syntax and better performance.
Transformers v5: We've migrated to Hugging Face Transformers v5, staying current with the latest model ecosystem.
3rd-party policy plugins: Just like v0.4.0's hardware plugin system, you can now register custom policies as installable packages — pip install lerobot_policy_mypolicy and use it with --policy.type=mypolicy. No core library changes needed. Learn how to do it by following the documentation.
Remote Rerun visualization: Visualize your robot's telemetry remotely using Rerun, with compressed image support for bandwidth-efficient streaming.
Installation improvements: Added uv installation instructions, clarified setup steps, and improved dependency management. Sequential install steps are now clearly documented.
Documentation versioning: Docs are now versioned, so you can always find documentation matching your installed release.
PyTorch version bump: Updated PyTorch version bounds to support NVIDIA Blackwell GPUs.

Community & Ecosystem

Modernized Discord: Updated the most vibrant community hub with a better channel organization.
GitHub README, templates & automated labeling: A refreshed README, new issue and PR templates, contributing guidelines, and automatic labeling of tickets — making it easier for everyone to contribute.
ICLR 2026 paper acceptance: The LeRobot paper has been accepted to ICLR 2026!
LeRobot Visualizer refresh: The visualization tool got a refresh with new dataset visualization badges and improved functionality. Check it out !
LeRobot Annotation Studio: A HuggingFace Space designed to easily annotate every moment of your dataset with natural language subtasks. Check it out !

Final Thoughts

Beyond these headline features, v0.5.0 includes hundreds of bug fixes, documentation improvements, CI/CD enhancements, and quality-of-life improvements across the entire codebase. From better type checking to more robust test infrastructure, we're investing in the foundations that make LeRobot reliable and maintainable as it scales.

We want to extend a huge thank you to everyone in the community — contributors, users, and collaborators alike — for helping LeRobot grow into what it is today. Every bug report, PR, and discussion makes this project better.

Stay tuned for more to come 🤗 Get started here! – The LeRobot team ❤️

There's a big surprise coming just right around the corner, stay tuned! 👕

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Hugging Face - Blog

TL;DR

Table of Contents

Hardware: More Robots Than Ever

Unitree G1 Humanoid

OpenArm & OpenArm Mini

More Robots

CAN Bus Motors

Policies: A Growing Model Zoo

Pi0-FAST: Autoregressive VLAs

Real-Time Chunking (RTC)

Wall-X

X-VLA

SARM

PEFT Support

Datasets: Faster Recording, Faster Training

Streaming Video Encoding

10x Faster Image Training, 3x Faster Encoding

New Dataset Tools

EnvHub: Environments from the Hub

NVIDIA IsaacLab-Arena

Codebase: A Modern Foundation

Community & Ecosystem

Final Thoughts