TL;DR This work uses the simulator to train both high-level vision-language policies and low-level reinforcement learning policies, each of which can be deployed in the real world in zero shot after training purely in simulation, and shows that other types of intelligence such as video captioning models can benefit from training with simulated experience, opening up even wider applications.

Appeared in surveys