TL;DR An automatic synthetic data generation pipeline is introduced that instruction-tunes VLMs to robotic domains and needs and outperforms state-of-the-art VLMs and visual prompting techniques by 21.8% and 30.5% respectively.
TL;DR An automatic synthetic data generation pipeline is introduced that instruction-tunes VLMs to robotic domains and needs and outperforms state-of-the-art VLMs and visual prompting techniques by 21.8% and 30.5% respectively.