Summary

This paper presents a method for transferring RL manipulation policies between robots with different morphologies (different joints, kinematics, and dynamics) using a shared latent space. By training encoders/decoders that map source and target robot state-action pairs to a common latent space via adversarial training and cycle consistency, the approach achieves zero-shot policy transfer without requiring target domain rewards or expert demonstrations.

本文提出一種透過共享潛在空間在不同形態機器人之間轉移操控策略的方法,利用對抗訓練和循環一致性損失,無需目標域獎勵函數或專家示範即可實現零樣本策略遷移。

Prerequisites

  • Reinforcement learning (RL) — the source domain policy is trained with RL; understanding policy gradients and MDPs is needed
  • Variational autoencoders / latent space representations — the core of the method is projecting robot states/actions into a shared latent space
  • Generative adversarial training — used to align the latent distributions of source and target domains
  • Cycle consistency (CycleGAN) — the key constraint ensuring unpaired cross-domain alignment without ground-truth correspondences

Core Idea

Instead of directly transferring a policy from one robot to another (impossible due to different state/action spaces), the method trains a source robot with a latent-space policy where states and actions are projected to/from an embodiment-agnostic latent space. At transfer time, only new encoders/decoders for the target robot need to be trained to align with the same latent space using unpaired, unaligned data — no task reward or expert demos required. The cycle consistency constraint (source → latent → target → latent → source should recover the original) prevents collapse while adversarial training ensures the latent distributions match.

Results

SettingTransfer
Panda → Sawyer (sim-to-sim)Successful pick-and-place transfer
Panda → xArm6 (sim-to-real)Successful real-world deployment

Zero-shot transfer without any target domain RL training or expert demonstrations.

Limitations

  • Author-stated: Assumes both robots can solve the same task at roughly the same speed (temporal alignment assumption)
  • Author-stated: Source code available but real-world setup requires specific hardware
  • Unstated: The approach requires the task to be achievable by both embodiments; it may struggle when kinematic constraints make the same strategy infeasible on the target robot
  • Unstated: Evaluated on relatively simple pick-and-place; contact-rich or dexterous tasks may be harder to transfer

Reproducibility

Insights

The cycle consistency insight from CycleGAN applied to robot embodiment transfer is elegant — it sidesteps the need for paired demonstrations, which is a major practical barrier. The three-stage pipeline (source policy learning, target alignment, deployment) cleanly separates concerns. This work predates the VLA/foundation model era; a future direction is whether large pre-trained robot policies make explicit latent alignment unnecessary via in-context adaptation.

Connections

Raw Excerpt

To achieve cross-embodiment policy transfer, our key insight is to project the state and action spaces of the source and target robots to a common latent space representation.