Summary

Hsieh et al. (UC Berkeley, arXiv:2512.02011) present DexScrew, a sim-to-real framework for contact-rich dexterous manipulation (screwdriving, nut-bolt fastening) that explicitly handles imperfect simulation. Key insight: RL in simplified simulation learns transferable rotational motion primitives; these primitives bootstrap real-world teleoperation to collect tactile demonstrations, enabling final behavior cloning policies that generalize to unseen object geometries.

Hsieh 等人(加州大學伯克利分校,arXiv:2512.02011)提出了 DexScrew,一個針對接觸豐富的靈巧操作(螺絲刀擰緊、螺母螺栓緊固)的 sim-to-real 框架,明確處理不完美的仿真。核心洞察:在簡化仿真中的 RL 學習可轉移的旋轉運動原語;這些原語引導真實世界的遙操作收集觸覺示範,最終使行為克隆策略能夠泛化到未見過的物體幾何形狀。

Prerequisites

  • Reinforcement learning for robotics (policy gradients, sim-to-real)
  • Dexterous hand manipulation; multi-fingered grasping
  • Tactile sensing and proprioception
  • Behavior cloning / imitation learning

Core Idea

Two classic approaches fail for contact-rich dexterous manipulation: (1) sim-to-real RL requires accurate physics simulation of complex contact dynamics and tactile sensing (intractable); (2) teleoperation-based imitation learning requires high-quality dexterous demonstrations (hard to collect at scale due to human-robot morphology gap).

DexScrew’s three-stage hybrid:

  1. RL in simplified simulation: train on simplified object models that capture the essential rotational structure but ignore fine contact details → learn correct finger gaits (motion primitives)
  2. Skill-assisted teleoperation: use the sim-trained policy as a skill primitive to guide human teleoperation in the real world → enables efficient collection of contact-rich demonstrations with tactile + proprioceptive data
  3. Behavior cloning with tactile sensing: train BC policy on real-world demonstrations → generalizes to diverse object geometries and is robust to perturbations

Results

  • High task progress ratios vs. direct sim-to-real transfer on screwdriving and nut-bolt fastening
  • Generalization to nuts/screwdrivers with diverse geometries not seen in training
  • Robust performance under external perturbations
  • Code and videos at dexscrew.github.io

Limitations

Author-stated:

  • Framework assumes the core motion primitive (rotational skill) can be learned in simplified simulation; may not generalize to tasks where the full contact dynamics are critical even for primitive learning

Unstated:

  • Teleoperation quality still bounds the quality of real-world demonstrations
  • Three-stage pipeline adds complexity vs. end-to-end approaches
  • Evaluation scope limited to screwdriving/fastening; generality to other dexterous tasks unstated

Reproducibility

  • Code/Data: Project page at dexscrew.github.io (code available per project page)
  • Hardware: Multi-fingered robot hand with tactile sensing
  • Compute: Standard RL + behavior cloning training scale

Insights

The key architectural insight is that motion primitives (the “how” of rotation) are learnable from simplified simulation, even if the full contact dynamics are not. This separates the skill acquisition problem (learned in sim) from the sensing/dynamics problem (learned from real demonstrations). It’s an elegant decomposition that sidesteps the sim-to-real gap without ignoring simulation entirely. The skill-primitive-as-teleoperation-aid is also clever: it enables the human to focus on high-level decisions while the robot handles low-level rotational control.

Connections

Raw Excerpt

The key idea is that the motion primitives underlying contact-rich dexterous manipulation do not need to be learned from a perfect physics model. A simplified simulator is sufficient to induce the core rotational behaviors required for these tasks.