PointWorld & 3D World Models for Robotic Manipulation
Research Focus
Literature survey centered on NVIDIA’s PointWorld (arXiv:2601.03782) and the broader research area of 3D point cloud representations, world models, and point tracking applied to robotic manipulation. Covers foundational 3D deep learning, learned world models for control, point tracking methods, and recent 3D VLA approaches.
DOI List
10.48550/arXiv.1803.10122
10.48550/arXiv.1612.00593
10.48550/arXiv.1706.02413
10.48550/arXiv.2209.05451
10.48550/arXiv.2301.04104
10.48550/arXiv.2306.08637
10.48550/arXiv.2306.17817
10.48550/arXiv.2307.07635
10.48550/arXiv.2308.16891
10.48550/arXiv.2310.06114
10.48550/arXiv.2310.16828
10.48550/arXiv.2403.03954
10.48550/arXiv.2403.09631
10.48550/arXiv.2406.10721
10.48550/arXiv.2501.15830
10.48550/arXiv.2601.03782
Paper Nodes
- ha-2018-world-models
- qi-2017-pointnet
- qi-2017-pointnet-plus-plus
- shridhar-2023-peract
- hafner-2023-dreamerv3
- doersch-2023-tapir
- gervet-2023-act3d
- karaev-2023-cotracker
- ze-2023-gnfactor
- yang-2023-unisim
- hansen-2023-tdmpc2
- ze-2024-3d-diffusion-policy
- zhu-2024-3d-vla
- yuan-2024-robopoint
- spatialvla-2025
- huang-2026-pointworld
Synthesis Matrix
| 論文 | 年份 | 貢獻類型 | 核心表示 | Cross-embodiment | 主要指標 |
|---|---|---|---|---|---|
| World Models | 2018 | 理論基礎 | Latent RNN | — | Simulated env score |
| PointNet | 2017 | 理論基礎 | Point set (permutation-invariant) | — | ModelNet40 accuracy |
| PointNet++ | 2017 | 架構 | Hierarchical point set | — | Classification/segmentation |
| PerAct | 2023 | 操控 benchmark | 3D voxel | No | RLBench success rate |
| DreamerV3 | 2024 | 世界模型 | Latent (image-based) | No | 150+ task success |
| TAPIR | 2023 | 點追蹤 | Per-frame 2D+temporal | — | TAP-Vid benchmark |
| Act3D | 2023 | 操控 policy | 3D feature field | No | RLBench success rate |
| CoTracker | 2024 | 點追蹤 | Joint 2D tracking | — | Point tracking benchmarks |
| GNFactor | 2023 | 操控 policy | Neural feature field (3D voxel) | No | RLBench success rate |
| UniSim | 2024 | 世界模型 | Video-based simulator | Partial | Zero-shot policy transfer |
| TD-MPC2 | 2024 | 世界模型 | Latent (image-based) | No | 104 continuous control tasks |
| 3D Diffusion Policy | 2024 | 操控 policy | 3D point cloud | No | Manipulation success rate |
| 3D-VLA | 2024 | VLA + 世界模型 | 3D scene + generative | Partial | Embodied reasoning |
| RoboPoint | 2024 | 空間推理 | VLM + spatial keypoints | No | Affordance prediction (+21.8%) |
| SpatialVLA | 2025 | VLA | Ego3D encoding | Yes | Cross-robot task success |
| PointWorld | 2026 | 3D 世界模型 | 3D point flow | Yes | Zero-shot real-world tasks |