Developments and Challenges towards Dexterous and Embodied Robotic Manipulation

本文由 AI 分析生成

建立時間： 2026-03-26 來源： https://arxiv.org/abs/2507.11840

Summary

This July 2025 survey traces the full arc of robotic manipulation — from mechanical programming through learned controllers to embodied intelligence — with particular focus on contemporary dexterous systems. Two research directions dominate: data collection (simulation, human demonstrations, teleoperation) and skill learning (imitation and reinforcement learning). The paper identifies three fundamental obstacles currently blocking progress toward truly capable dexterous robots.

此 2025 年 7 月調查追溯了機器人操作的完整發展歷程，從機械編程到學習型控制器再到具身智能。重點聚焦於當代靈巧系統的兩大研究方向：資料收集（模擬、人類示範、遙操作）和技能學習（模仿與強化學習）。論文確定了目前阻礙靈巧機器人進展的三個根本性障礙。

Prerequisites

History of robot manipulation — the survey traces mechanical → programmed → learning-based → embodied; context for each transition helps
Gripper design — parallel jaw → multi-finger → dexterous hands; understanding mechanical constraints contextualizes why learning approaches differ by gripper
Sim-to-real transfer — simulation is used for pretraining/scaling; understanding the domain gap is necessary
VLA (Vision-Language-Action) models — the survey culminates at VLAs as the frontier policy architecture; transformer-based multimodal models background helps

Core Idea

Robotic manipulation is undergoing a paradigm shift: the robot’s body, sensors, and world model are increasingly treated as a unified “embodied” system rather than separable components. This shift mirrors what happened in NLP (from pipelines to end-to-end transformers) and vision (from hand-crafted features to learned representations). The key implication: data collection and policy learning are no longer separate phases — they must be co-designed, and the data distribution determines what the policy can do.

Results

Survey findings:

Historical arc: mechanical (1960s) → programmed (1980s) → learning-based (2010s) → embodied AI (2020s)
Gripper evolution parallels: parallel jaw → multi-finger → dexterous (22+ DOF)
Policy hierarchy: BC → GAIL → diffusion policies → VLA foundation models
Three fundamental obstacles: not fully detailed in abstract (requires full paper access)

Limitations

Author-stated: identifies three obstacles but does not fully resolve them (July 2025 snapshot)
Unstated: the “embodied intelligence” framing may overstate the extent to which current VLAs have genuine world models vs. statistical pattern matching

Reproducibility

Code: survey paper; references individual systems
Datasets: references standard manipulation benchmarks across multiple categories
Compute: not applicable (survey)

Insights

The framing as “embodied” rather than just “manipulation” reflects a field-wide shift that has significant implications: if the robot’s body and sensors are part of the model, then hardware choices become research decisions. A policy trained on one robot’s embodiment does not transfer to another. VLAs partially address this via language conditioning — the same model can be prompted differently per robot — but the embodiment gap remains a research-level problem.

Connections

Raw Excerpt

Surveys the evolution of robotic manipulation systems progressing from mechanical programming to embodied intelligence, alongside advances in gripper technology. Focuses on two primary research directions: data collection approaches and skill-learning methodologies.

bot_vault

Explorer

Developments and Challenges towards Dexterous and Embodied Robotic Manipulation

Summary

Prerequisites

Core Idea

Results

Limitations

Reproducibility

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks