本文由 AI 分析生成
建立時間: 2026-03-25 來源: https://compliant-residual-dagger.github.io/
Summary
CR-DAgger addresses two core DAgger challenges — collecting informative human corrections and updating policies efficiently — by introducing a Compliant Intervention Interface (kinesthetic teaching with compliance control) and a Compliant Residual Policy (force-aware residual learning on top of a base policy). The system improves base policy success rates by 64% across four contact-rich tasks using minimal correction data.
CR-DAgger 透過順應控制介面(允許人類在不中斷機器人執行的情況下施加微小修正)和力感知殘差策略,解決了 DAgger 中收集有效修正數據和高效策略更新的核心挑戰,在四個接觸密集任務上提升 64% 成功率。
Prerequisites
- DAgger (Dataset Aggregation) — the online imitation learning framework this work builds on; CR-DAgger’s novelty is an improved data collection interface and a residual policy formulation
- Compliance control — robot control mode that makes joints compliant to external forces, necessary to understand the “gentle correction” mechanism
- Residual policy learning — the idea of adding a small learned residual on top of a frozen base policy rather than retraining from scratch
Core Idea
Existing DAgger implementations in robotics either collect offline demonstrations (which may deviate from the base policy’s distribution) or use take-over corrections (which cause force discontinuity). CR-DAgger solves this with a kinesthetic interface that lets humans apply gentle delta corrections while the robot policy is still running — leveraging compliance control to record both position deltas and force feedback. The residual policy then learns from these corrections while predicting both residual motions and target forces, making it force-aware even when the base policy is position-only. This design keeps correction data close to the base policy’s state distribution while enabling fine-grained contact control.
Results
| Task | CR-DAgger | Base Policy | Delta |
|---|---|---|---|
| Book flipping | ~80%+ | ~20% baseline | +64% overall avg |
| Belt assembly | improved | baseline | +64% overall avg |
| Cable routing | improved | baseline | +64% overall avg |
| Gear insertion | improved (sub-mm accuracy) | baseline | +64% overall avg |
CR-DAgger outperforms both retrain-from-scratch and fine-tuning under the same data budget.
Limitations
- Author-stated: System evaluated on four specific contact-rich tasks; generalization to other domains not fully explored
- Author-stated: Requires compliance-capable robot hardware
- Unstated: The 64% improvement is an average across four tasks — individual task improvements may vary significantly; the gear insertion task (sub-mm accuracy) likely shows different characteristics than cable routing
Reproducibility
- Code: Available at hardware design, training data, and policy code link in paper
- Datasets: Real-world collected; four tasks (book flipping, belt assembly, cable routing, gear insertion)
- Compute: Real-world experiments; no large-scale GPU training mentioned for the residual policy
Insights
The “intention misinterpretation” phenomenon identified here — where robot tracking errors cause the recorded corrections to differ from the human’s intent — is an underappreciated failure mode in kinesthetic teaching that deserves more attention in the imitation learning community. The insight that you can provide force feedback without taking over control is a practical advancement over teleoperation-based DAgger.
Connections
Raw Excerpt
Our system significantly enhances performance on precise contact-rich manipulation tasks using minimal correction data, improving base policy success rates by 64% on four challenging tasks (book flipping, belt assembly, cable routing, and gear insertion) while outperforming both retraining-from-scratch and finetuning approaches.