Summary

OSMO is a wearable open-source tactile glove with 12 three-axis magnetic sensors on fingertips and palm, designed to collect contact-rich human manipulation demonstrations for robot imitation learning. The key contribution is enabling zero-shot robot policy training from purely human demonstrations — no real robot data required. On a marker-wiping task, a diffusion policy trained exclusively on OSMO data achieved 71.69% success vs. 55.75% for vision-only.

OSMO 是一款可穿戴開源觸覺手套,配備 12 個三軸磁性感測器,設計目標是在不需要任何真實機器人資料的情況下收集人類操作示範。在擦拭任務上,純人類示範訓練的擴散策略達到 71.69% 成功率,遠超僅視覺輸入的 55.75%。

Prerequisites

  • Tactile sensing fundamentals — understanding shear vs. normal force measurement is essential to appreciate why three-axis sensors are needed for grasping tasks.
  • Imitation learning / behavior cloning — the paper’s core contribution is a data collection device, so understanding how demonstration data trains robot policies is required.
  • Hand pose retargeting — OSMO relies on HaMeR and IK to convert human hand keypoints to robot joint angles; understanding the embodiment gap problem motivates this pipeline.
  • Diffusion policy — the downstream policy architecture (action chunking with diffusion) determines what input modalities matter and at what frequency.

Core Idea

OSMO’s key insight is that the bottleneck in robot manipulation data collection is not the volume of demonstrations but the absence of contact information. When humans demonstrate tasks, they unconsciously regulate finger forces in ways invisible to cameras. By instrumenting the human hand with the same sensor modality (magnetic tactile arrays) that could be placed on robot fingertips, OSMO captures contact signatures that transfer directly as conditioning signals to downstream policies. The zero-real-robot-data result is significant: it implies that the sensor’s signal space is rich enough to bridge the human-robot embodiment gap without domain randomization or real robot fine-tuning.

Results

TaskTactile policyVision-onlyProprioception-only
Marker wiping (success %)71.69 ± 27.4355.7527.12

Limitations

  • Author-stated: Magnetic sensors require crosstalk mitigation (MuMetal shielding); glove fit varies across hand sizes, potentially affecting sensor placement.
  • Unstated: Tested on only one task (marker wiping); the zero-real-robot-data claim needs validation on contact-diverse tasks requiring precise finger coordination. The Psyonic Ability Hand is an unusual robot platform (prosthetic limb origin) — generalization to standard dexterous hands (Allegro, LEAP) not demonstrated.

Reproducibility

  • Code: Open-source hardware and software at Meta FAIR GitHub (linked in paper)
  • Datasets: 140 human demonstrations (~2 hours), custom wiping task
  • Compute: Diffusion policy training on standard GPU; inference at 2 Hz

Insights

The “zero real robot data” framing positions OSMO as a scalable data collection amplifier: if human hands can serve as the data source, demonstration throughput is bounded only by human operator availability rather than robot availability. This is a direct parallel to the UMI (Universal Manipulation Interface) philosophy but applied to the dexterous (multi-finger) regime. The magnetic sensing approach trades optical richness (GelSight) for wearability and compatibility with hand-tracking systems — a worthwhile tradeoff for in-the-wild collection.

Connections

Raw Excerpt

“A robot policy trained exclusively on human demonstrations collected with OSMO, without any real robot data, is capable of executing a challenging contact-rich manipulation task.”