LeRobot Community Datasets: The “ImageNet” of Robotics

Hugging Face - Blog

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs ALTK‑Evolve: On‑the‑Job Learning for AI Agents Safetensors is Joining the PyTorch Foundation Holo3: Breaking the Computer Use Frontier Any Custom Frontend with Gradio's Backend A New Framework for Evaluating Voice Agents (EVA) Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations One-Shot Any Web App with Gradio's gr.HTML CUGA on Hugging Face: Democratizing Configurable AI Agents New in llama.cpp: Model Management Building Deep Research: How we Achieved State of the Art OVHcloud on Hugging Face Inference Providers 🔥 20x Faster TRL Fine-tuning with RapidFire AI Building for an Open Future - our new partnership with Google Cloud Aligning to What? Rethinking Agent Generalization in MiniMax M2 Building a Healthcare Robot from Simulation to Deployment with NVIDIA Isaac Sentence Transformers is joining Hugging Face! Unlock the power of images with AI Sheets Supercharge your OCR Pipelines with Open Models Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face Get your VLM running in 3 simple steps on Intel CPUs Nemotron-Personas-India: Synthesized Data for Sovereign AI Introducing RTEB: A New Standard for Retrieval Evaluation Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models VibeGame: Exploring Vibe Coding Games Nemotron-Personas-Japan: ソブリン AI のための合成データセット Swift Transformers Reaches 1.0 – and Looks to the Future Smol2Operator: Post-Training GUI Agents for Computer Use SyGra: The One-Stop Framework for Building Data for LLMs and SLMs Gaia2 and ARE: Empowering the community to study agents Scaleway on Hugging Face Inference Providers 🔥 Democratizing AI Safety with RiskRubric.ai Public AI on Hugging Face Inference Providers 🔥 `LeRobotDataset:v3.0`: Bringing large-scale datasets to `lerobot` Visible Watermarking with Gradio Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason! Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers Fine-tune Any LLM from the Hugging Face Hub with Together AI Jupyter Agents: training LLMs to reason with notebooks mmBERT: ModernBERT goes Multilingual Welcome EmbeddingGemma, Google's new efficient embedding model SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence Make your ZeroGPU Spaces go brrr with ahead-of-time compilation NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset Generate Images with Claude and Hugging Face From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels MCP for Research: How to Connect AI to Research Tools Kimina-Prover-RL Arm & ExecuTorch 0.7: Bringing Generative AI to the masses Neural Super Sampling is here! TextQuests: How Good are LLMs at Text-Based Video Games? 🇵🇭 FilBench - Can LLMs Understand and Generate Filipino? Introducing AI Sheets: a tool to work with datasets using open AI models! Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training Vision Language Model Alignment in TRL ⚡️ Welcome GPT OSS, the new open-source model family from OpenAI! Measuring Open-Source Llama Nemotron Models on DeepResearch Bench 📚 3LM: A Benchmark for Arabic LLMs in STEM and Code Implementing MCP Servers in Python: An AI Shopping Assistant with Gradio Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨ Parquet Content-Defined Chunking TimeScope: How Long Can Your Video Large Multimodal Model Go? Fast LoRA inference for Flux with Diffusers and PEFT Accelerate a World of LLMs on Hugging Face with NVIDIA NIM Arc Virtual Cell Challenge: A Primer Consilium: When Multiple LLMs Collaborate Back to The Future: Evaluating AI Agents on Predicting Future Events Five Big Improvements to Gradio MCP Servers Ettin Suite: SoTA Paired Encoders and Decoders Migrating the Hub from Git LFS to Xet Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models Asynchronous Robot Inference: Decoupling Action Prediction and Execution ScreenEnv: Deploy your full stack Desktop Agent Building the Hugging Face MCP Server Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Creating custom kernels for the AMD MI300 Upskill your LLMs With Gradio MCP Servers SmolLM3: smol, multilingual, long-context reasoner Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure Efficient MultiModal Data Pipeline Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models Training and Finetuning Sparse Embedding Models with Sentence Transformers Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub Gemma 3n fully available in the open-source ecosystem! Transformers backend integration in SGLang (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware Groq on Hugging Face Inference Providers 🔥 How Long Prompts Block Other Requests - Optimizing LLM Performance Learn the Hugging Face Kernel Hub in 5 Minutes Convert Transformers to ONNX with Hugging Face Optimum Intel and Hugging Face Partner to Democratize Machine Learning Hardware Acceleration Director of Machine Learning Insights [Part 3: Finance Edition] The Annotated Diffusion Model Deep Q-Learning with Space Invaders Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers Introducing Pull Requests and Discussions 🥳 Efficient Table Pre-training without Real Data: An Introduction to TAPEX An Introduction to Q-Learning Part 2/2 How Sempre Health is leveraging the Expert Acceleration Program to accelerate their ML roadmap

LeRobot Community Datasets: The “ImageNet” of Robotics — When and How?

Dana Aubakirova, Alexandre Chapin, Mustafa Shukor, Marina Barann · 2025-05-11 · via Hugging Face - Blog

Back to Articles

🧭 TL;DR — Why This Blogpost?

In this post, we:

Recognize the growing impact of community-contributed LeRobot datasets
Highlight the current challenges in robotic data collection and curation
Share practical steps and best practices to maximize the impact of this collective effort
Our goal is to frame generalization as a data problem, and to show how building an open, diverse “ImageNet of robotics” is not just possible—but already happening.

Introduction

Recent advances in Vision-Language-Action (VLA) models have enabled robots to perform a wide range of tasks—from simple commands like “grasp the cube” to more complex activities like folding laundry or cleaning a table. These models aim to achieve generalization: the ability to perform tasks in novel settings, with unseen objects, and in varying conditions.

“The biggest challenge in robotics isn’t dexterity, but generalization—across physical, visual, and semantic levels.”
— Physical Intelligence

A robot must "figure out how to correctly perform even a simple task in a new setting or with new objects," and this requires both robust skills and common-sense understanding of the world. Yet, progress is often limited by the availability of diverse data for such robotic systems.

“Generalization must occur at many levels. At the low level, the robot must understand how to pick up a spoon (by the handle) or plate (by the edge), even if it has not seen these specific spoons or plates before, and even if they are placed in a pile of dirty dishes. At a higher level, the robot must understand the semantics of each task—where to put clothes and shoes (ideally in the laundry hamper or closet, not on the bed), and what kind of tool is appropriate for wiping down a spill. This generalization requires both robust physical skills and a common-sense understanding of the environment, so that the robot can generalize at many levels at the same time, from physical, to visual, to semantic. This is made even harder by the limited availability of diverse data for such robotic systems.”
— Physical Intelligence

From Models to Data: Shifting the Perspective

To simplify, the core of generalist policies lies in a simple idea: co-training on heterogeneous datasets. By exposing VLA models to a variety of environments, tasks, and robot embodiments, we can teach models not only how to act, but why—how to interpret a scene, understand a goal, and adapt skills across contexts.

💡 “Generalization is not just a model property—it’s a data phenomenon.”
It emerges from the diversity, quality, and abstraction level of the training data.

This brings us to a fundamental question:

Given current datasets, what is the upper limit of generalization we can expect?

Can a robot meaningfully respond to a completely novel prompt—say, *"set up a surprise birthday party"*—if it has never encountered anything remotely similar during training? Especially when most datasets are collected in academic labs, by a limited number of people, under well-controlled setups?

We frame generalization as a data-centric view: treating it as the process of abstracting broader patterns from data—essentially “zooming out” to reveal task-agnostic structures and principles. This shift in perspective emphasizes the role of dataset diversity, rather than model architecture alone, in driving generalization.

Why does Robotics lack its ImageNet Moment?

So far, the majority of robotics datasets come from structured academic environments. Even if we scale up to millions of demonstrations, one dataset will often dominate, limiting diversity. Unlike ImageNet—which aggregated internet-scale data and captured the real world more holistically—robotics lacks a comparably diverse, community-driven benchmark.

This is largely because collecting data for robotics requires physical hardware and significant effort.

Building a LeRobot Community

That’s why, at LeRobot, we’re working to make robotics data collection more accessible—at home, at school, or anywhere. We're:

Simplifying the recording pipeline
Streamlining uploading to the Hugging Face Hub, to foster community sharing
Reducing hardware costs

We're already seeing the results: the number of community-contributed datasets on the Hub is growing rapidly.

Growth of <i>lerobot</i> datasets on the Hugging Face Hub over time

Growth of lerobot datasets on the Hugging Face Hub over time.

If we break down the uploaded datasets by robot type, we see that most contributions are to So100 and Koch, making robotic arms and manipulation tasks the primary focus of the current LeRobot dataset landscape. However, it’s important to remember that the potential reaches far beyond. Domains like autonomous vehicles, assistive robots, and mobile navigation stand to benefit just as much from shared data. This momentum brings us closer to a future where datasets reflect a global effort, not just the contributions of a single lab or institution.

Distribution of lerobot datasets by robot type

Distribution of lerobot datasets by robot type.

Here are just a few standout community-contributed datasets that show how diverse and imaginative robotics can be:

lirislab/close_top_drawer_teabox:: precise manipulation with a household drawer
Chojins/chess_game_001_blue_stereo: a full chess match captured from a stereo camera setup
pierfabre/chicken: yes — a robot interacting with colorful animal figures, including a chicken 🐔

Explore additional creative datasets under the LeRobot tag on the Hugging Face Hub, and interactively view them in the LeRobot Dataset Visualizer.

Scaling Responsibly

As robotics data collection becomes more democratized, curation becomes the next challenge. While these datasets are still collected in constrained setups, they are a crucial step toward affordable, general-purpose robotic policies. Not everyone has access to expensive hardware—but with shared infrastructure and open collaboration, we can build something far greater.

🧠 “Generalization isn’t solved in a lab—it’s taught by the world.”
The more diverse our data, the more capable our models will be.

Better data = Better models

Why does data quality matter? Poor-quality data results in poor downstream performance, biased outputs, and models that fail to generalize. Hence, efficient and high-quality data collection plays a critical role in advancing generalist robotic policies.

While foundation models in vision and language have thrived on massive, web-scale datasets, robotics lacks an “Internet of robots”—a vast, diverse corpus of real-world interactions. Instead, robotic data is fragmented across different embodiments, sensor setups, and control modes, forming isolated data islands.

To overcome this, recent approaches like Gr00t organize training data as a pyramid, where:

Large-scale web and video data form the foundation
Synthetic data adds simulated diversity
Real-world robot interactions at the top ground the model in physical execution

Within this framework, efficient real-world data collection is indispensable—it anchors learned behaviors in actual robotic hardware and closes the sim-to-real gap, ultimately improving the generalization, adaptability, and performance of robotics foundation models.

By expanding the volume and diversity of real-world datasets, we reduce fragmentation between heterogeneous data sources. When datasets are disjoint in terms of environment, embodiment, or task distribution, models struggle to transfer knowledge across domains.

🔗 Real-world data acts as connective tissue—it aligns abstract priors with grounded action and enables the model to build more coherent and transferable representations.

As a result, increasing the proportion of real robot interactions does not merely enhance realism—it structurally reinforces the links between all layers of the pyramid, leading to more robust and capable policies.

Data Pyramid for Robot Foundation Model Training

Data Pyramid for Robot Foundation Model Training. Adapted from Gr00t (Yang et al., 2025). Data quantity decreases while embodiment specificity increases from bottom to top.

Challenges with Current Community Datasets

At LeRobot, we’ve started developing an automatic curation pipeline to post-process community datasets. During the post-processing phase, we’ve identified several areas where improvements can further boost dataset quality and facilitate more effective curation going forward:

1. Incomplete or Inconsistent Task Annotations

Many datasets lack task descriptions, lack details or are ambiguous in the task to be done. Semantics is currently at the core of cognition, meaning that understanding the context and specifics of a task is crucial for robotic performance. Detailed expressions ensure that robots understand exactly what is expected, but also provide a broader knowledge and vocabulary to the cognition system. Ambiguity can lead to incorrect interpretation and, consequently, incorrect actions.

Task instructions can be:

Empty
Too short (e.g. “Hold”, “Up”)
Without any specific meaning (e.g. “task desc”, “desc”)

Subtask-level annotations are often missing, making it difficult to model complex task hierarchies.
While this can be handled with VLM, it is still better to have a task annotation provided by the author of the dataset at hand.

2. Feature Mapping Inconsistencies

Features like images.laptop are ambiguously labeled:

Sometimes it's a third-person view
Other times it's more like a gripper (wrist) camera

Manual mapping of dataset features to standardized names is time-consuming and error-prone.
We can possibly automate feature type inference using VLMs or computer vision models to classify camera perspectives. However, keeping this in mind helps to have a cleaner dataset.

3. Low-Quality or Incomplete Episodes

Some datasets contain:

Episodes with only 1 or very few frames
Manually deleted data files (e.g., deleted .parquet files without reindexing), breaking the sequential consistency.

4. Inconsistent Action/State Dimensions

Different datasets use different action or state dimensions, even for the same robot (e.g., so100).
Some datasets show inconsistencies in action/state format.

What Makes a Good Dataset?

Now that we know that creating a high-quality dataset is essential for training reliable and generalizable robot policies, we have outlined a checklist of best practices to assist you in collecting effective data.

Image Quality

✅ Use preferably two camera views
✅ Ensure steady video capture (no shaking)
✅ Maintain neutral, stable lighting (avoid overly yellow or blue tones)
✅ Ensure consistent exposure and sharp focus
✅ Leader arm should not appear in the frame
✅ The only moving objects should be the follower arm and the manipulated items (avoid human limbs/bodies)
✅ Use a static, non-distracting background, or apply controlled variations
✅ Record in high resolution (at least 480x640 / 720p)

Metadata & Recording Protocol

✅ Select the correct robot type in the metadata If you're using a custom robot that's not listed in the official LeRobot config registry,
we recommend checking how similar robots are named in existing datasets on the LeRobot Hub to ensure consistency.
✅ Record videos at approximately 30 frames per second (FPS)
✅ If deleting episodes, make sure to update the metadata files accordingly (we will provide proper tools to edit datasets)

Feature Naming Conventions

Use a consistent and interpretable naming scheme for all camera views and observations:

Format:

<modality>.<location>

Examples:

images.top
images.front
images.left
images.right

Avoid device-specific names:

❌ images.laptop
❌ images.phone

For wrist-mounted cameras, specify orientation:

images.wrist.left
images.wrist.right
images.wrist.top
images.wrist.bottom

Consistent naming improves clarity and helps downstream models better interpret spatial configurations and multi-view inputs.

Task Annotation

✅ Use the task field to clearly describe the robot’s objective
- Example: Pick the yellow lego block and put it in the box
✅ Keep task descriptions concise (between 25–50 characters)
✅ Avoid vague or generic names like task1, demo2, etc.

Below, we provide a checklist that serves as a guideline for recording datasets, outlining key points to keep in mind during the data collection process.

Dataset Recording Checklist

Figure 4: Dataset Recording Checklist – a step-by-step guide to ensure consistent and high-quality real-world data collection.

How Can You Help?

The next generation of generalist robots won't be built by a single person or lab — they'll be built by all of us. Whether you're a student, a researcher, or just robot-curious, here’s how you can jump in:

🎥 Record your own datasets — Use LeRobot tools to capture and upload good quality datasets from your robots.
🧠 Improve dataset quality — Follow our checklist, clean up your recordings, and help set new standards for robotics data.
📦 Contribute to the Hub — Upload datasets, share examples, and explore what others are building.
💬 Join the conversation — Give feedback, request features, or help shape the roadmap by engaging in our LeRobot Discord Server.
🌍 Grow the movement — Introduce LeRobot to your club, classroom, or lab. More contributors = better generalization.

Start recording, start contributing—because the future of generalist robots depends on the data we build today.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Hugging Face - Blog

Introduction

From Models to Data: Shifting the Perspective

Why does Robotics lack its ImageNet Moment?

Building a LeRobot Community

Scaling Responsibly

Better data = Better models

Challenges with Current Community Datasets

1. Incomplete or Inconsistent Task Annotations

2. Feature Mapping Inconsistencies

3. Low-Quality or Incomplete Episodes

4. Inconsistent Action/State Dimensions

What Makes a Good Dataset?

Image Quality

Metadata & Recording Protocol

Feature Naming Conventions

Task Annotation

How Can You Help?