Quartz 5

❯

❯

SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model

SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model

Properties1

tags

Mar 30, 20261 min read

TL;DR Ego3D Position Encoding is introduced to inject 3D information into the input observations of the visual-language-action model, and Adaptive Action Grids to represent spatial robot movement actions with adaptive discretized action grids are proposed, facilitating learning generalizable and transferrable spatial action knowledge for cross-robot control.

Appeared in surveys

2026-03-30-pointworld-3d-world-models

Graph View

Backlinks

PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation
PointWorld & 3D World Models for Robotic Manipulation

Created with Quartz v5.0.0 © 2026

GitHub
Discord Community