DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies

本文由 AI 分析生成

建立時間： 2026-03-25 來源： https://dexwild.github.io/

Summary

DexWild (CMU) enables scalable collection of dexterous robot training data by using humans with a low-cost wearable device (DexWild-System) instead of robot teleoperation. Co-training on large-scale human + small robot demonstration datasets produces policies that generalize 4× better to unseen environments than robot-data-only training.

DexWild（CMU）透過讓人類使用低成本穿戴設備（而非機器人遠端操作）來收集靈巧操控訓練數據。在大規模人類示範和少量機器人示範上聯合訓練，在未見環境中的泛化能力比純機器人數據訓練提升 4 倍。

Prerequisites

Imitation learning / behavior cloning — DexWild policies are trained via imitation, not RL
Sim-to-real and embodiment gap — key challenge is bridging human hand ↔ robot hand action spaces
Cross-embodiment transfer — the 5.8× better cross-embodiment result requires understanding how policies are shared across robot hardware

Core Idea

Teleoperation provides high-quality data but scales poorly. DexWild-System is a portable, low-cost wearable that lets untrained operators collect data naturally in any environment — 9,290 demonstrations across 93 environments at 4.6× faster collection than robot teleoperation. The key insight is co-training: neither human-only nor robot-only data generalizes well alone, but their combination gives the policy both visual diversity (from human data in varied environments) and robot-specific grounding (from robot data). This mirrors how pretraining on internet data + fine-tuning on task-specific data works in LLMs.

Results

Setting	DexWild	Robot-only	Improvement
Unseen environments	68.5% success	~17%	~4× higher
Cross-embodiment	—	baseline	5.8× better

Limitations

Author-stated: DexWild-System evaluated on specific hand configurations; not all dexterous morphologies supported
Unstated: The 93 environments may still be a narrow distribution compared to real-world diversity; success metrics are task-specific
Unstated: Human demonstration quality variance (untrained operators) could introduce noise; ablations needed to quantify this

Reproducibility

Code: Available at https://dexwild.github.io
Datasets: 9,290 human demonstrations across 93 environments; released
Compute: Standard GPU training for imitation learning; data collection is the main cost driver

Insights

DexWild is essentially applying the “pretraining on broad data + fine-tuning” paradigm from NLP to robotics at the data collection level. The 4.6× data collection speedup through human embodiment is the key enabler — it reduces the economic barrier to building large diverse datasets. The cross-embodiment result (5.8×) is notable: human hand data seems to transfer across robot hardware better than robot-specific data, possibly because human demonstrations capture task-relevant features rather than hardware-specific motions.

Connections

Raw Excerpt

DexWild enables dexterous policies to generalize to new objects, scenes, and embodiments. This is achieved by leveraging large-scale, real-world human embodiment data collected in many scenes and co-trained with a smaller robot embodiment dataset for grounding.

bot_vault

Explorer

DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies

Summary

Prerequisites

Core Idea

Results

Limitations

Reproducibility

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks