本文由 AI 分析生成
建立時間: 2026-04-05 來源: https://arxiv.org/abs/2310.17596
Summary
MimicGen is a data generation system that amplifies a small number of human demonstrations (~10–200) into over 50,000 diverse synthetic demonstrations across 18 manipulation tasks. It works by decomposing demonstrations into object-centric segments and rigidly transforming them to new scene configurations, then validating generated trajectories via physics simulation. Policies trained on MimicGen data achieve 59–96% success depending on task complexity.
MimicGen 是一個資料生成系統,將少量人類示範(10–200 個)放大成 50,000+ 個多樣化合成示範,覆蓋 18 種操作任務。透過將示範分解為物件中心片段並剛性變換到新場景配置,再用物理模擬驗證,解決機器人學習的資料規模瓶頸。
Prerequisites
- Imitation learning / Behavior Cloning — MimicGen generates data to train BC policies; understanding IL is needed to interpret the results
- Object-centric representations — the decomposition step assumes rigid-body object-centric segments; scenes with deformable or freely interacting objects break the assumption
- Robot simulation (MuJoCo/Robosuite) — generation and validation happen entirely in simulation; understanding sim physics helps interpret success/failure rates
- HDF5 dataset format — output stored in HDF5 compatible with Robomimic; knowing the format helps with downstream training pipeline
Core Idea
The key insight is that robot manipulation demonstrations have an object-centric structure: each subtask is a motion relative to a specific object, and that relative motion can be transformed to new scene configurations via rigid-body SE(3) transforms. MimicGen exploits this by (1) segmenting human demos at subtask boundaries, (2) retaining each segment’s object-relative motion, and (3) recomposing transformed segments with motion planning to bridge transitions. Physics simulation acts as a free quality filter — infeasible trajectories simply fail and are discarded.
Results
| Metric | Result |
|---|---|
| Demos generated from <200 human demos | 50,000+ |
| Data multiplier (10 demos → ~1,000) | ~100× |
| BC success range across 18 tasks | 59–96% |
| Square task: generated vs human demo | 79% vs 84% (comparable) |
| DexMimicGen (2024): humanoid bimanual | Scales to 22-DoF systems |
Limitations
- Author-stated: assumes rigid-body objects; deformable objects (cloth, fluids) not supported
- Author-stated: generated data quality degrades for tasks requiring many sequential decisions
- Unstated: the 50–70% generation success rate means ~30–50% of compute is wasted on failed trajectories
- Unstated: the sim-to-real gap means generated demonstrations need real-world fine-tuning before deployment
- Unstated: “mixed-quality” generated trajectories — RoboCasa365 found MimicGen data quality lower than human demos, though scale compensates
Reproducibility
- Code: open-source at github.com/NVlabs/mimicgen
- Datasets: Robosuite simulation environments (MuJoCo); 18 task environments provided
- Compute: demo generation ~30 min/1,000 demos on GPU; BC training ~30 min/1,000 epochs
Insights
MimicGen inverts the conventional data collection bottleneck: instead of asking “how do we collect more demonstrations?” it asks “how do we extract more information from the demonstrations we have?” The object-centric decomposition is the enabling insight — it means demonstrations are not rigid sequences but composable motion primitives that can be recombined.
The comparable performance of generated vs human data (79% vs 84% on Square) raises a deeper question: if generated data quality is close to human data quality at a fraction of the cost, what is the marginal value of human teleoperation for well-covered tasks? The answer likely depends on task complexity — for long-horizon, contact-rich tasks, human judgment still dominates.
DexMimicGen (2024) extending this to bimanual humanoid hands (22 DoF) is a significant step: it suggests the object-centric decomposition paradigm scales to more complex morphologies.
Connections
- robocasa365-large-scale-simulation-generalist-robots
- Clippings-datalab-output-2510.10903v1.pdf
- Clippings-lerobot-imitation-learning-field-report-ml6
Raw Excerpt
“We generate over 50,000 demonstrations from less than 200 human demonstrations across 18 tasks, multiple simulators, and the real-world.”