TL;DR A single end-to-end trained model that learns to map robot observations to actions while benefiting from large-scale pretraining on web language and vision-language data.
TL;DR A single end-to-end trained model that learns to map robot observations to actions while benefiting from large-scale pretraining on web language and vision-language data.