TL;DR 3D-VLA is proposed by introducing a new family of embodied foundation models that seamlessly link 3D perception, reasoning, and action through a generative world model and significantly improves the reasoning, multimodal generation, and planning capabilities in embodied environments.