How to Train Your Robots? The Impact of Demonstration Modality on Imitation Learning

本文由 AI 分析生成

建立時間： 2025-03-10 來源： https://arxiv.org/abs/2503.07017

Summary

This ICRA 2025 paper from Stanford (Li, Cui, Sadigh) systematically compares three robot demonstration modalities — kinesthetic teaching, VR teleoperation, and spacemouse — and shows that kinesthetic teaching produces higher-quality data and better policy performance, but is physically demanding. A simple hybrid scheme (30% kinesthetic + 70% VR) achieves the best results, improving average success rates by ~20%.

Stanford ICRA 2025 論文，比較運動學引導（Kinesthetic Teaching）、VR 遙操作、SpaceMouse 三種示範方式對 Imitation Learning 的影響。Kinesthetic Teaching 產生動作一致性最高的資料，但操作者體力負擔大；混合方案（30% Kinesthetic + 70% VR）在多數任務中達到最高成功率。

Prerequisites

Behavior Cloning (BC) — 本論文用 Diffusion-based BC 訓練策略；理解 BC 的 Covariate Shift 問題有助於解讀為什麼 State Diversity 和 Action Consistency 各有重要性
Diffusion Policy — 論文使用 Diffusion-based BC 作為 Policy 架構；需知道它預測動作 Chunk（16步，執行8步）而非單步
Kinematic Replay — Kinesthetic Teaching 的動作記錄需透過重播機制轉換為 Delta Pose 指令；接觸任務中 Replay Jerkiness 是主要失效原因
Cartesian Impedance Control — Franka Panda 以 Cartesian Impedance 模式運行；理解此模式有助於解讀為何接觸力任務對 Kinesthetic Replay 更困難

Core Idea

不同示範方式在兩個維度上存在根本取捨：Action Consistency（動作一致性）和 State Diversity（狀態多樣性）。Kinesthetic Teaching 因為人直接引導機器手臂，動作路徑高度重複一致，K-NN Action Variance 最低；但遙操作因為操作者每次嘗試稍有不同，狀態探索更廣。Diffusion-based BC 從兩者都能獲益：高 Action Consistency 提供清晰的模仿目標，高 State Diversity 提供 Recovery 能力。混合方案之所以超越單一方式，正是因為它同時擁有兩種特性。

Results

Task	Kinesthetic	VR	SpaceMouse	Hybrid (30K+70V)
Open Drawer	95%	~70%	~65%	100%
Flip Glass	70%	~55%	~50%	75%
Push Sanitizer	35%	~55%	~50%	N/A

Push Sanitizer 中 Kinesthetic Replay 的 Jerkiness（接觸力補償引入的抖動）導致表現反轉，VR 和 SpaceMouse 反而更好。

Limitations

Author-stated: 單一示範者（風格變異有限）；Kinesthetic 缺乏 Force Sensor，用啟發式補償；非專家資料雜訊過大無法分析
Unstated: 混合比例（30%/70%）是否在其他任務或機器人上仍最優未知；實驗任務數量少（3個）；僅測 Diffusion Policy，結論是否適用於 ACT、BC-Z 等其他架構未驗證

Reproducibility

Code: 未提及開源連結
Datasets: 實驗室自收集，100 trajectories × 3 modalities × 3 tasks
Compute: Franka Panda 機器人（7-DoF），Diffusion Policy 訓練（論文未說明 GPU 資源）

Insights

使用者偏好和資料品質解耦：客觀效果最好的 Kinesthetic Teaching 是使用者最不願意長期使用的方式（體力問題）。這解釋了為什麼現有大型資料集（如 Open X-Embodiment）多用遙操作，但也說明了這些資料集的系統性偏差來源。
混合收集策略和 MimicGen 的組合潛力：先用少量 Kinesthetic 收集高 Action Consistency 的 seed demo，再用 MimicGen 自動擴增，可能兼顧品質和規模。
接觸力任務是 Kinesthetic Teaching 的阿基里斯腱：Replay 機制在需要主動施力的任務中引入 Jerkiness，這個問題需要 Force Sensor 整合才能根本解決。

Connections

Raw Excerpt

“kinesthetic teaching exhibits high action consistency, while teleoperation via VR or spacemouse offer higher state diversity. These complementary properties explain why a mixture of both modalities achieves the best policy performance.”

bot_vault

Explorer

How to Train Your Robots? The Impact of Demonstration Modality on Imitation Learning

Summary

Prerequisites

Core Idea

Results

Limitations

Reproducibility

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks