RLT: Online RL for Precise Robot Manipulation

本文由 AI 分析生成

建立時間： 2026-03-22 來源： https://www.pi.website/research/rlt

Summary

Physical Intelligence introduces RLT (RL tokens), a method that extracts a compact latent representation from Vision-Language-Action (VLA) models and uses it to train lightweight actor-critic networks directly on-robot via online RL. The approach achieves up to 3× speed improvement on precision manipulation tasks (screwdriver, zip tie, Ethernet, charger insertion) using only minutes to hours of real-world data. Rather than replacing the VLA’s predicted action, the actor learns to refine it, preserving generalization while adding precision.

Key Points

Extracts a compressed “RL token” from VLA embeddings via an encoder-decoder bottleneck
Lightweight actor-critic networks train on-device at hundreds of updates per second
Actor edits the VLA’s predicted action rather than replacing it — keeps baseline behavior intact
Regularization constrains exploration near baseline, deviating only when beneficial
Results: Screwdriver 1.7 → 14 successes/10min; Ethernet 147 → 400; Charger 136 → 600
50% of Ethernet insertion trials exceeded all human teleoperation speeds

Insights

The “edit, don’t replace” framing is architecturally elegant: it sidesteps catastrophic forgetting by keeping the VLA frozen and layering adaptation on top
Sample efficiency is the crux — prior RL-for-robotics work often required thousands of environment steps; minutes-to-hours of real data is a step change
On-device training at hundreds of updates per second suggests the critic and actor are extremely lightweight, likely far smaller than the VLA itself
The contact-rich, sub-millimeter precision regime is exactly where imitation learning from human demos hits a ceiling (human demos are noisy at that scale), so RL’s ability to optimize directly for success is load-bearing here
This points toward a deployment paradigm where robots ship with a capable-but-imprecise base policy and self-improve in the field

Connections

Raw Excerpt

“the actor receives the VLA’s predicted action as input, so it learns to edit the VLA action rather than replace it entirely.”

bot_vault

Explorer

RLT: Online RL for Precise Robot Manipulation

Summary

Key Points

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks