Quartz 5

❯

❯

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Properties1

tags

Mar 30, 20261 min read

TL;DR A single end-to-end trained model that learns to map robot observations to actions while benefiting from large-scale pretraining on web language and vision-language data.

Appeared in surveys

2026-03-30-cross-embodiment-manipulation

Graph View

Backlinks

Cross-Embodiment Robot Manipulation — Benchmarks & Datasets

Created with Quartz v5.0.0 © 2026

GitHub
Discord Community