本文由 AI 分析生成
建立時間: 2026-04-05 來源: https://arxiv.org/abs/2410.08464
Summary
ARCap is an AR-based portable data collection system that solves a key failure mode in robot learning data collection: demonstrations that are kinematically infeasible for the robot. Using a Meta Quest 3 with AR passthrough, Rokoko data gloves, and a RealSense depth camera, the system overlays a virtual robot on the real world and provides real-time visual and haptic feedback when the user’s motion exceeds robot kinematic limits. This raises replay success rates by 40%+ and enables novice users to collect deployment-quality data.
ARCap 是基於 AR 的可攜式資料收集系統,用 Meta Quest 3 AR 穿透模式將虛擬機器人疊加在真實世界中,提供即時視覺和觸覺回饋以防止超出關節限制或碰撞。相比沒有回饋的 DexCap,replay 成功率提升 40%+,讓新手也能收集可部署品質的示範。
Prerequisites
- Inverse kinematics — ARCap uses real-time IK to retarget human hand poses to robot joint configurations; latency constraints drive the hardware choices
- AR/VR passthrough — the Meta Quest 3 MR passthrough is the enabling technology for overlaying virtual content on real-world scenes
- Diffusion policy + PointNet — the downstream IL pipeline uses these for training; understanding them helps interpret evaluation results
- Embodiment gap — the fundamental mismatch between human hand anatomy and robot end-effector that ARCap’s feedback is designed to bridge
Core Idea
The core problem is that humans naturally collect demonstrations that violate robot constraints — they move too fast, extend beyond joint limits, or cause collisions with objects. Without feedback, these problems are invisible during collection and only discovered when the robot fails to replay the trajectory. ARCap’s insight is to close the feedback loop at collection time rather than post-hoc filtering: the AR display shows the user exactly what the robot would do, making the embodiment gap viscerally clear before any data is recorded.
Results
| Metric | ARCap | DexCap (no feedback) | Delta |
|---|---|---|---|
| Replay success rate | Significantly higher | Baseline | +40% |
| Scene visibility | Higher | Baseline | +60% |
| Cluttered scene task success | 70% | 25% | +45pp |
| 3-stage Lego assembly (parallel gripper) | 40% | 0% | +40pp |
| Collision during testing | 0 | Multiple | — |
User study: 20 participants across multiple tasks.
Limitations
- Author-stated: requires Windows laptop for real-time IK (portability somewhat limited by compute)
- Author-stated: currently validated on two end-effectors; general multi-finger dexterous hands may require additional retargeting work
- Unstated: Meta Quest 3 MR passthrough quality limits fine-grained hand-eye coordination for very precise tasks (sub-millimeter)
- Unstated: the depth camera + Quest 3 combination adds ~200-300ms system latency that could affect natural motion quality
- Unstated: Rokoko gloves are 2,500 — cheaper than DexCap’s EM gloves but still significant
Reproducibility
- Code: fully open-source
- Hardware: all off-the-shelf; assembly instructions provided
- Compute: inference policy training uses diffusion policy on PointNet features (single GPU)
Insights
ARCap reframes the data collection problem from “how to capture human motion” to “how to guide humans to move like robots.” This is a subtle but important shift: the constraint is not human capability but human knowledge of robot kinematics. AR feedback transfers that knowledge in real-time without requiring users to internalize joint limits abstractly.
The +40% replay success rate is a remarkable improvement from what is essentially a UI change (adding AR feedback). It suggests that a large fraction of existing robot learning datasets may contain demonstrations that are lower quality than necessary — not because of bad human performance but because of missing feedback.
The cross-embodiment capability (same system works for dexterous hands and parallel grippers) is practically significant: most competing systems require hardware redesigns for different end-effectors.
Connections
- Clippings-anyteleop-vision-based-dexterous-teleoperation
- Clippings-humanoid-teleop-with-full-body-tracking-using-the-meta-quest-3-and-isaacsim-simu
- Clippings-datalab-output-2510.10903v1.pdf
Raw Excerpt
“ARCap enables novice users to collect robot-executable data that matches robot kinematics and avoids collisions with the scenes.”