Summary

CR-DAgger addresses two core DAgger challenges — collecting informative human corrections and updating policies efficiently — by introducing a Compliant Intervention Interface (kinesthetic teaching with compliance control) and a Compliant Residual Policy (force-aware residual learning on top of a base policy). The system improves base policy success rates by 64% across four contact-rich tasks using minimal correction data.

CR-DAgger 透過順應控制介面(允許人類在不中斷機器人執行的情況下施加微小修正)和力感知殘差策略,解決了 DAgger 中收集有效修正數據和高效策略更新的核心挑戰,在四個接觸密集任務上提升 64% 成功率。

Prerequisites

  • DAgger (Dataset Aggregation) — the online imitation learning framework this work builds on; CR-DAgger’s novelty is an improved data collection interface and a residual policy formulation
  • Compliance control — robot control mode that makes joints compliant to external forces, necessary to understand the “gentle correction” mechanism
  • Residual policy learning — the idea of adding a small learned residual on top of a frozen base policy rather than retraining from scratch

Core Idea

Existing DAgger implementations in robotics either collect offline demonstrations (which may deviate from the base policy’s distribution) or use take-over corrections (which cause force discontinuity). CR-DAgger solves this with a kinesthetic interface that lets humans apply gentle delta corrections while the robot policy is still running — leveraging compliance control to record both position deltas and force feedback. The residual policy then learns from these corrections while predicting both residual motions and target forces, making it force-aware even when the base policy is position-only. This design keeps correction data close to the base policy’s state distribution while enabling fine-grained contact control.

Results

TaskCR-DAggerBase PolicyDelta
Book flipping~80%+~20% baseline+64% overall avg
Belt assemblyimprovedbaseline+64% overall avg
Cable routingimprovedbaseline+64% overall avg
Gear insertionimproved (sub-mm accuracy)baseline+64% overall avg

CR-DAgger outperforms both retrain-from-scratch and fine-tuning under the same data budget.

Limitations

  • Author-stated: System evaluated on four specific contact-rich tasks; generalization to other domains not fully explored
  • Author-stated: Requires compliance-capable robot hardware
  • Unstated: The 64% improvement is an average across four tasks — individual task improvements may vary significantly; the gear insertion task (sub-mm accuracy) likely shows different characteristics than cable routing

Reproducibility

  • Code: Available at hardware design, training data, and policy code link in paper
  • Datasets: Real-world collected; four tasks (book flipping, belt assembly, cable routing, gear insertion)
  • Compute: Real-world experiments; no large-scale GPU training mentioned for the residual policy

Insights

The “intention misinterpretation” phenomenon identified here — where robot tracking errors cause the recorded corrections to differ from the human’s intent — is an underappreciated failure mode in kinesthetic teaching that deserves more attention in the imitation learning community. The insight that you can provide force feedback without taking over control is a practical advancement over teleoperation-based DAgger.

Connections

Raw Excerpt

Our system significantly enhances performance on precise contact-rich manipulation tasks using minimal correction data, improving base policy success rates by 64% on four challenging tasks (book flipping, belt assembly, cable routing, and gear insertion) while outperforming both retraining-from-scratch and finetuning approaches.