Interactive Imitation Learning for Dexterous Robotic Manipulation

本文由 AI 分析生成

建立時間： 2026-03-26 來源： https://arxiv.org/abs/2506.00098

Summary

This May 2025 survey addresses covariate shift — the distribution mismatch between training demonstrations and deployment states that causes compounding errors in behavior cloning. For dexterous manipulation with 21+ DOF hands, this failure mode is especially severe. The paper surveys interactive imitation learning — approaches where humans provide corrective feedback during robot execution — covering DAgger variants, diffusion-based policies, and HITL techniques as promising but underexplored directions.

此 2025 年 5 月調查針對協變量偏移——訓練示範和部署狀態之間的分布不匹配，導致行為克隆中的誤差累積。對於 21+ 自由度的靈巧手，這一失敗模式尤為嚴重。論文調查了互動式模仿學習，包括 DAgger 變體、擴散策略和 HITL 技術。

Prerequisites

Covariate shift and its consequences — why small errors compound: once BC reaches an OOD state, all subsequent actions are wrong; this is the central problem
DAgger (Dataset Aggregation) — the canonical interactive IL algorithm; understanding its data aggregation loop is essential
Diffusion policies — why modeling full action distributions (not point estimates) is more robust to covariate shift
High-dimensional control spaces — 21+ DOF dexterous hands; understanding why this is fundamentally harder than 6-DOF arms helps contextualize the motivation

Core Idea

Standard behavior cloning is brittle: small prediction errors push the robot into states the expert never demonstrated, where the policy fails. The fix is iterative: collect additional demonstrations in the states the robot actually visits during execution (DAgger), continuously closing the distribution gap. Diffusion policies partially mitigate this by producing committed, non-averaged actions even in ambiguous states. HITL extends this further: humans actively correct the robot during deployment rather than only providing upfront demonstrations, enabling real-time distribution shift correction.

Results

Survey synthesis:

DAgger and variants: iterative data collection reduces covariate shift measurably; requires continued human involvement during training
Diffusion policies: outperform deterministic BC on contact-rich tasks; better handling of multimodal action distributions reduces error cascades
HITL at test time: human intervention during deployment improves long-horizon task success significantly vs. fully autonomous rollout

Limitations

Author-stated: HITL requires continuous human attention during deployment — economically, it is not always viable
Author-stated: 21+ DOF control spaces are the target but most surveyed systems evaluate on simpler setups
Unstated: DAgger’s requirement for human labeling of robot-visited states scales poorly with robot diversity; each new robot configuration requires fresh corrections

Reproducibility

Code: survey paper; references DAgger variants and diffusion policy codebases
Datasets: various dexterous manipulation benchmarks
Compute: not applicable (survey)

Insights

Interactive IL occupies an important but uncomfortable niche: it acknowledges that deployment IS training, but requires keeping humans in the loop beyond the initial demonstration phase. This is exactly the cost you’re trying to eliminate by deploying a robot. The paper’s implicit argument is that for safety-critical or precision tasks, the cost of continued human involvement is worth the reliability gains. The field hasn’t resolved when this tradeoff is favorable.

Connections

Raw Excerpt

Covariate shift (distribution mismatch between demonstration and deployment) → compounding errors. DAgger and variants: iterative correction by human teacher during robot execution. Diffusion policies: model the full action distribution, robust to multimodality. HITL: human-in-the-loop correction at test time improves long-horizon performance.

bot_vault

Explorer

Interactive Imitation Learning for Dexterous Robotic Manipulation

Summary

Prerequisites

Core Idea

Results

Limitations

Reproducibility

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks