Summary

MimicGen is a data generation system that amplifies a small number of human demonstrations (~10–200) into over 50,000 diverse synthetic demonstrations across 18 manipulation tasks. It works by decomposing demonstrations into object-centric segments and rigidly transforming them to new scene configurations, then validating generated trajectories via physics simulation. Policies trained on MimicGen data achieve 59–96% success depending on task complexity.

MimicGen 是一個資料生成系統,將少量人類示範(10–200 個)放大成 50,000+ 個多樣化合成示範,覆蓋 18 種操作任務。透過將示範分解為物件中心片段並剛性變換到新場景配置,再用物理模擬驗證,解決機器人學習的資料規模瓶頸。

Prerequisites

  • Imitation learning / Behavior Cloning — MimicGen generates data to train BC policies; understanding IL is needed to interpret the results
  • Object-centric representations — the decomposition step assumes rigid-body object-centric segments; scenes with deformable or freely interacting objects break the assumption
  • Robot simulation (MuJoCo/Robosuite) — generation and validation happen entirely in simulation; understanding sim physics helps interpret success/failure rates
  • HDF5 dataset format — output stored in HDF5 compatible with Robomimic; knowing the format helps with downstream training pipeline

Core Idea

The key insight is that robot manipulation demonstrations have an object-centric structure: each subtask is a motion relative to a specific object, and that relative motion can be transformed to new scene configurations via rigid-body SE(3) transforms. MimicGen exploits this by (1) segmenting human demos at subtask boundaries, (2) retaining each segment’s object-relative motion, and (3) recomposing transformed segments with motion planning to bridge transitions. Physics simulation acts as a free quality filter — infeasible trajectories simply fail and are discarded.

Results

MetricResult
Demos generated from <200 human demos50,000+
Data multiplier (10 demos → ~1,000)~100×
BC success range across 18 tasks59–96%
Square task: generated vs human demo79% vs 84% (comparable)
DexMimicGen (2024): humanoid bimanualScales to 22-DoF systems

Limitations

  • Author-stated: assumes rigid-body objects; deformable objects (cloth, fluids) not supported
  • Author-stated: generated data quality degrades for tasks requiring many sequential decisions
  • Unstated: the 50–70% generation success rate means ~30–50% of compute is wasted on failed trajectories
  • Unstated: the sim-to-real gap means generated demonstrations need real-world fine-tuning before deployment
  • Unstated: “mixed-quality” generated trajectories — RoboCasa365 found MimicGen data quality lower than human demos, though scale compensates

Reproducibility

  • Code: open-source at github.com/NVlabs/mimicgen
  • Datasets: Robosuite simulation environments (MuJoCo); 18 task environments provided
  • Compute: demo generation ~30 min/1,000 demos on GPU; BC training ~30 min/1,000 epochs

Insights

MimicGen inverts the conventional data collection bottleneck: instead of asking “how do we collect more demonstrations?” it asks “how do we extract more information from the demonstrations we have?” The object-centric decomposition is the enabling insight — it means demonstrations are not rigid sequences but composable motion primitives that can be recombined.

The comparable performance of generated vs human data (79% vs 84% on Square) raises a deeper question: if generated data quality is close to human data quality at a fraction of the cost, what is the marginal value of human teleoperation for well-covered tasks? The answer likely depends on task complexity — for long-horizon, contact-rich tasks, human judgment still dominates.

DexMimicGen (2024) extending this to bimanual humanoid hands (22 DoF) is a significant step: it suggests the object-centric decomposition paradigm scales to more complex morphologies.

Connections

Raw Excerpt

“We generate over 50,000 demonstrations from less than 200 human demonstrations across 18 tasks, multiple simulators, and the real-world.”