3D-VLA: A 3D Vision-Language-Action Generative World Model

TL;DR 3D-VLA is proposed by introducing a new family of embodied foundation models that seamlessly link 3D perception, reasoning, and action through a generative world model and significantly improves the reasoning, multimodal generation, and planning capabilities in embodied environments.

Appeared in surveys

2026-03-30-pointworld-3d-world-models

Quartz 5

Explorer

3D-VLA: A 3D Vision-Language-Action Generative World Model

Appeared in surveys

Graph View

Backlinks