Multimodal Perception-Driven Decision-Making for HRI: Survey

本文由 AI 分析生成

建立時間： 2026-03-22 來源： https://www.frontiersin.org/journals/robotics-and-ai/articles/10.3389/frobt.2025.1604472/full

Summary

A 20-year survey (2004-2024) of Multimodal Perception-Driven Decision-Making (MPDDM) frameworks for HRI, covering how robots integrate vision, language, and tactile sensing to act in human environments. Covers four application domains and identifies generalization and sensor fusion as the field’s deepest unsolved challenges.

涵蓋 2004-2024 年的 20 年多模態感知驅動決策框架調查，探討機器人如何整合視覺、語言和觸覺感測以在人類環境中行動。涵蓋四個應用領域，並指出泛化能力和感測器融合是該領域最深層的未解挑戰。

Key Points

Four application domains: social/assistive robotics, navigation, industrial cobots, general-purpose robotics
Modalities combined: vision (dominant), language, tactile, audio, proprioception
Core challenge: sensor fusion complexity — noise and timing misalignment across modalities
Generalization failure: models trained in lab environments fail in real-world deployment
Human variability (behavior, culture, context) makes policy learning extremely hard
Future direction: adaptive fusion + efficient learning + human-trusted decision-making

Insights

The 20-year retrospective makes one pattern clear: each modality (vision, language, touch) has been extensively studied in isolation; the frontier is their integration — which turns out to be much harder than combining the individual components
“Human-trusted decision-making” as a future direction signals that the field is moving beyond capability (can the robot perceive?) toward legitimacy (will humans accept the robot’s decisions?)
The generalization problem in MPDDM mirrors the open/closed model gap in VLAs: simulation performance is high, real-world deployment is fragile
Tactile sensing is notably underrepresented despite being essential for manipulation tasks — GR-Dexter’s piezoresistive arrays are a rare hardware contribution to this gap

Connections

Raw Excerpt

The survey advocates for adaptive multimodal fusion techniques, more efficient learning paradigms, and human-trusted decision-making frameworks to advance HRI research.

bot_vault

Explorer

Multimodal Perception-Driven Decision-Making for HRI: Survey

Summary

Key Points

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks