ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback

本文由 AI 分析生成

建立時間： 2026-04-05 來源： https://arxiv.org/abs/2410.08464

Summary

ARCap is an AR-based portable data collection system that solves a key failure mode in robot learning data collection: demonstrations that are kinematically infeasible for the robot. Using a Meta Quest 3 with AR passthrough, Rokoko data gloves, and a RealSense depth camera, the system overlays a virtual robot on the real world and provides real-time visual and haptic feedback when the user’s motion exceeds robot kinematic limits. This raises replay success rates by 40%+ and enables novice users to collect deployment-quality data.

ARCap 是基於 AR 的可攜式資料收集系統，用 Meta Quest 3 AR 穿透模式將虛擬機器人疊加在真實世界中，提供即時視覺和觸覺回饋以防止超出關節限制或碰撞。相比沒有回饋的 DexCap，replay 成功率提升 40%+，讓新手也能收集可部署品質的示範。

Prerequisites

Inverse kinematics — ARCap uses real-time IK to retarget human hand poses to robot joint configurations; latency constraints drive the hardware choices
AR/VR passthrough — the Meta Quest 3 MR passthrough is the enabling technology for overlaying virtual content on real-world scenes
Diffusion policy + PointNet — the downstream IL pipeline uses these for training; understanding them helps interpret evaluation results
Embodiment gap — the fundamental mismatch between human hand anatomy and robot end-effector that ARCap’s feedback is designed to bridge

Core Idea

The core problem is that humans naturally collect demonstrations that violate robot constraints — they move too fast, extend beyond joint limits, or cause collisions with objects. Without feedback, these problems are invisible during collection and only discovered when the robot fails to replay the trajectory. ARCap’s insight is to close the feedback loop at collection time rather than post-hoc filtering: the AR display shows the user exactly what the robot would do, making the embodiment gap viscerally clear before any data is recorded.

Results

Metric	ARCap	DexCap (no feedback)	Delta
Replay success rate	Significantly higher	Baseline	+40%
Scene visibility	Higher	Baseline	+60%
Cluttered scene task success	70%	25%	+45pp
3-stage Lego assembly (parallel gripper)	40%	0%	+40pp
Collision during testing	0	Multiple	—

User study: 20 participants across multiple tasks.

Limitations

Author-stated: requires Windows laptop for real-time IK (portability somewhat limited by compute)
Author-stated: currently validated on two end-effectors; general multi-finger dexterous hands may require additional retargeting work
Unstated: Meta Quest 3 MR passthrough quality limits fine-grained hand-eye coordination for very precise tasks (sub-millimeter)
Unstated: the depth camera + Quest 3 combination adds ~200-300ms system latency that could affect natural motion quality
Unstated: Rokoko gloves are $1, 500 +, makin g t h e f u ll sys t e m$ 2,500 — cheaper than DexCap’s EM gloves but still significant

Reproducibility

Code: fully open-source
Hardware: all off-the-shelf; assembly instructions provided
Compute: inference policy training uses diffusion policy on PointNet features (single GPU)

Insights

ARCap reframes the data collection problem from “how to capture human motion” to “how to guide humans to move like robots.” This is a subtle but important shift: the constraint is not human capability but human knowledge of robot kinematics. AR feedback transfers that knowledge in real-time without requiring users to internalize joint limits abstractly.

The +40% replay success rate is a remarkable improvement from what is essentially a UI change (adding AR feedback). It suggests that a large fraction of existing robot learning datasets may contain demonstrations that are lower quality than necessary — not because of bad human performance but because of missing feedback.

The cross-embodiment capability (same system works for dexterous hands and parallel grippers) is practically significant: most competing systems require hardware redesigns for different end-effectors.

Connections

Raw Excerpt

“ARCap enables novice users to collect robot-executable data that matches robot kinematics and avoids collisions with the scenes.”

bot_vault

Explorer

ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback

Summary

Prerequisites

Core Idea

Results

Limitations

Reproducibility

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks