Compliant Residual DAgger: Improving Contact-Rich Manipulation with Human Corrections

本文由 AI 分析生成

建立時間： 2026-03-25 來源： https://compliant-residual-dagger.github.io/

Summary

CR-DAgger addresses two core DAgger challenges — collecting informative human corrections and updating policies efficiently — by introducing a Compliant Intervention Interface (kinesthetic teaching with compliance control) and a Compliant Residual Policy (force-aware residual learning on top of a base policy). The system improves base policy success rates by 64% across four contact-rich tasks using minimal correction data.

CR-DAgger 透過順應控制介面（允許人類在不中斷機器人執行的情況下施加微小修正）和力感知殘差策略，解決了 DAgger 中收集有效修正數據和高效策略更新的核心挑戰，在四個接觸密集任務上提升 64% 成功率。

Prerequisites

DAgger (Dataset Aggregation) — the online imitation learning framework this work builds on; CR-DAgger’s novelty is an improved data collection interface and a residual policy formulation
Compliance control — robot control mode that makes joints compliant to external forces, necessary to understand the “gentle correction” mechanism
Residual policy learning — the idea of adding a small learned residual on top of a frozen base policy rather than retraining from scratch

Core Idea

Existing DAgger implementations in robotics either collect offline demonstrations (which may deviate from the base policy’s distribution) or use take-over corrections (which cause force discontinuity). CR-DAgger solves this with a kinesthetic interface that lets humans apply gentle delta corrections while the robot policy is still running — leveraging compliance control to record both position deltas and force feedback. The residual policy then learns from these corrections while predicting both residual motions and target forces, making it force-aware even when the base policy is position-only. This design keeps correction data close to the base policy’s state distribution while enabling fine-grained contact control.

Results

Task	CR-DAgger	Base Policy	Delta
Book flipping	~80%+	~20% baseline	+64% overall avg
Belt assembly	improved	baseline	+64% overall avg
Cable routing	improved	baseline	+64% overall avg
Gear insertion	improved (sub-mm accuracy)	baseline	+64% overall avg

CR-DAgger outperforms both retrain-from-scratch and fine-tuning under the same data budget.

Limitations

Author-stated: System evaluated on four specific contact-rich tasks; generalization to other domains not fully explored
Author-stated: Requires compliance-capable robot hardware
Unstated: The 64% improvement is an average across four tasks — individual task improvements may vary significantly; the gear insertion task (sub-mm accuracy) likely shows different characteristics than cable routing

Reproducibility

Code: Available at hardware design, training data, and policy code link in paper
Datasets: Real-world collected; four tasks (book flipping, belt assembly, cable routing, gear insertion)
Compute: Real-world experiments; no large-scale GPU training mentioned for the residual policy

Insights

The “intention misinterpretation” phenomenon identified here — where robot tracking errors cause the recorded corrections to differ from the human’s intent — is an underappreciated failure mode in kinesthetic teaching that deserves more attention in the imitation learning community. The insight that you can provide force feedback without taking over control is a practical advancement over teleoperation-based DAgger.

Connections

Raw Excerpt

Our system significantly enhances performance on precise contact-rich manipulation tasks using minimal correction data, improving base policy success rates by 64% on four challenging tasks (book flipping, belt assembly, cable routing, and gear insertion) while outperforming both retraining-from-scratch and finetuning approaches.

bot_vault

Explorer

Compliant Residual DAgger: Improving Contact-Rich Manipulation with Human Corrections

Summary

Prerequisites

Core Idea

Results

Limitations

Reproducibility

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks