Human-Robot Interaction: Current Research Landscape
Research Question
What are the active research directions, key challenges, and open problems in Human-Robot Interaction (HRI) as of 2025-2026?
Knowledge Map
Prerequisite areas to understand before diving into HRI research:
- Robotics Fundamentals (Kinematics, Control Theory) — HRI is built on top of robots that can move reliably; without understanding how robots plan and control motion, it’s impossible to reason about safety or physical interaction constraints
- Computer Vision and Perception — robots perceive humans through cameras, depth sensors, and tactile arrays; most HRI systems depend heavily on visual recognition (gesture, pose, face, gaze)
- Natural Language Processing (NLP) — with LLMs now central to HRI, understanding how language models represent meaning, generate responses, and reason about context is essential
- Human Factors and Cognitive Science — HRI is fundamentally about humans; understanding how people form mental models of robots, develop trust, and experience fear or discomfort is as important as the robot’s technical capabilities
- Machine Learning (Supervised, Reinforcement, Imitation Learning) — the primary methods for teaching robots behaviors in HRI contexts, especially imitation from human demonstrations
- Sensor Fusion — HRI systems combine vision, audio, tactile, and proprioceptive signals; understanding how to integrate noisy heterogeneous data streams is a prerequisite for building robust systems
- Ethics and Safety Engineering — physical proximity between humans and robots creates injury risk; familiarity with ISO/TS 15066 and broader AI ethics frameworks is needed to reason about deployment constraints
Sources Gathered
New sources clipped and analyzed during this research:
- Clippings-hri-llm-systematic-review — Systematic review of 86 LLM+HRI papers; identifies human-centered gaps (CHI 2026)
- Clippings-physical-hri-safety-constraints — Critical review of ISO/TS 15066 safety standards for physical HRI
- Clippings-multimodal-perception-hri-survey — 20-year survey of multimodal perception frameworks across 4 HRI domains
- Clippings-vln-human-robot-collaboration-survey — Survey of VLN for HRC; identifies bidirectionality and multi-agent gaps
Existing vault notes referenced:
- Clippings-gr-dexter-bimanual-dexterous-vla — Dexterous VLA by ByteDance; direct example of physical HRI at the manipulation level
- Clippings-state-of-vla-research-iclr-2026 — VLA research landscape; overlaps significantly with HRI’s “robot learning from human demonstration” thread
Key Findings
HRI in 2025-2026 is splitting along two axes that are developing somewhat independently:
Physical HRI vs. Social HRI. Physical HRI (cobots in manufacturing, surgical robots, dexterous manipulation) is primarily a control and safety problem. The challenge is defining what “safe” means — ISO/TS 15066 provides force/pressure limits, but these are based on contested biomechanical assumptions, and the right fundamental unit is energy transfer, not instantaneous force. Social HRI (companion robots, assistive systems, conversational agents) is primarily a perception and interaction design problem. The challenge is recognizing human state (emotion, intent, attention) and generating responses that humans find appropriate, not just functional.
LLMs are the biggest recent inflection point. Before LLMs, robots had scripted interaction patterns or narrow learned policies. LLMs enable contextual sensing (understanding situational context from language), socially grounded dialogue, and common-sense reasoning. The 2026 CHI systematic review of 86 papers confirms this shift is real and broad — but the research is fragmented, lacks standardized evaluation, and overindexes on technical capability while underinvesting in human-centered factors like user modeling and appropriate autonomy.
Multimodal perception is the technical frontier. Robots that can only see (or only hear) are fundamentally limited. The frontier is integrating vision, language, and touch — but sensor fusion at this scale is hard. The 20-year retrospective shows each modality was studied in isolation for decades; integration is the active open problem. Generalization from lab to real-world remains unresolved.
Bidirectionality is the interaction design gap. Most current HRI systems are command-execution pipelines: humans instruct, robots act. True interaction requires bidirectionality — robots that can ask clarifying questions, signal uncertainty, and negotiate task completion. The VLN survey identifies this as a critical missing capability across the field.
Trust and acceptance are undertheorized. Trust appears as a theme in nearly every HRI survey, but is rarely operationally defined. Fear responses (especially in older adults toward social robots) are categorized into 7 dimensions including the Uncanny Valley effect, privacy concerns, and technology unfamiliarity — but the field lacks a unified model of how trust develops and breaks down in HRI.
Open Questions
- How should safety standards evolve for dexterous robots (21+ DOF hands) that operate in contact-rich, unpredictable scenarios far beyond the cobot assembly-line context ISO/TS 15066 was designed for?
- What is the right autonomy level for LLM-driven robots? When should a robot defer to a human vs. act autonomously?
- How do we build evaluation benchmarks for social HRI that measure human experience, not just task completion rates?
- Can proactive clarification (robot-initiated disambiguation) be implemented without being annoying or disrupting task flow?
- How do cultural differences in human-robot social norms affect global deployment of the same robot system?
Report
Human-Robot Interaction is one of the fastest-growing areas in robotics, and the reasons are structural: the same LLM revolution that transformed text AI is now arriving at robotics through the VLA paradigm, enabling robots to understand natural language, reason about context, and interact with humans in ways that were impossible with scripted systems. But HRI is not a single field — it spans physical interaction safety, social robot design, assistive technology, industrial cobots, and autonomous navigation, and these threads have different maturity levels, different research communities, and different definitions of success.
The physical side of HRI is more mature but is undergoing a regulatory reckoning. ISO/TS 15066, the standard that governs how close robots can work to humans in manufacturing, is built on biomechanical assumptions that are not well-validated. The 2026 critical review of these standards argues that energy transfer — not force or pressure — is the correct safety metric, and that common design approximations introduce unquantified performance penalties. The 2025 revision of ISO 10218-2 reflects a deeper shift: safety is no longer a property of the robot, but of the application. This has practical consequences for how robots are certified and deployed.
The social and cognitive side of HRI is less mature but is moving faster. LLMs have become the dominant enabling technology for natural interaction, and the CHI 2026 systematic review of 86 papers documents how rapidly the field has adopted them — while noting that the research is fragmented and human-centered considerations lag behind. The core insight is that LLMs give robots something they lacked before: the ability to understand context from language. A robot that hears “hand me the thing next to the blue cup” can parse that into a grounded spatial reference rather than failing on a syntax it was never trained on.
Multimodal perception ties the physical and social threads together. Robots in real environments need to integrate vision, audio, language, and touch to understand what humans are doing and what they need. The 20-year survey of MPDDM frameworks shows consistent progress in individual modalities and growing work on integration — but generalization remains the unresolved problem. Models trained in controlled lab settings fail when deployed in real homes, hospitals, or factories. This is the same gap that appears in VLA research (simulation vs. real-world performance), and it likely has the same root cause: distribution shift between training data and deployment environment.
The VLN survey adds a dimension that is often missed in pure robotics work: the interaction is two-way. Humans don’t just issue commands — they adapt their instructions based on what the robot does, ask follow-up questions, and expect the robot to signal when it’s confused. Current systems don’t support this. Building bidirectional HRI systems requires solving uncertainty quantification (when does the robot know it doesn’t know?), proactive communication (how does the robot ask without being annoying?), and turn-taking in dialogue (how do human-robot conversations maintain coherence over multiple exchanges?).
The field is at an inflection point. The technical capabilities — LLMs, multimodal perception, dexterous manipulation — have advanced dramatically. The limiting factors now are human-centered: trust, acceptance, appropriate autonomy, and evaluation methodology that measures what actually matters (do humans prefer working with this robot?) rather than what’s easy to measure (did the robot complete the task?).
中文版
研究問題
2025-2026 年人機互動(HRI)研究有哪些活躍方向、核心挑戰與未解問題?
知識地圖
深入 HRI 研究前需要理解的基礎領域:
- 機器人學基礎(運動學、控制理論) — HRI 建立在可靠移動的機器人之上;若不理解機器人如何規劃與控制動作,就無法推論安全性或物理互動約束
- 電腦視覺與感知 — 機器人透過攝影機、深度感測器和觸覺陣列感知人類;大多數 HRI 系統高度依賴視覺識別(手勢、姿態、臉部、眼神)
- 自然語言處理(NLP) — LLM 已成為 HRI 的核心;理解語言模型如何表示語義、生成回應和推理語境是必要條件
- 人因工程與認知科學 — HRI 本質上是關於人類的;理解人們如何建立對機器人的心智模型、發展信任、以及產生恐懼或不適,和機器人的技術能力同樣重要
- 機器學習(監督式、強化學習、模仿學習) — 在 HRI 情境中訓練機器人行為的主要方法,尤其是從人類示範的模仿學習
- 感測器融合 — HRI 系統整合視覺、音訊、觸覺和本體感知信號;理解如何整合雜訊異質資料流是建構強健系統的先決條件
- 倫理與安全工程 — 人機物理近距離帶來受傷風險;熟悉 ISO/TS 15066 和更廣泛的 AI 倫理框架,是推論部署約束的必要條件
關鍵發現
物理 HRI vs. 社交 HRI。 物理 HRI(製造業協作機器人、手術機器人、靈巧操作)主要是控制與安全問題。核心挑戰是定義「安全」的含義——ISO/TS 15066 提供力/壓力限制,但這些基於存疑的生物力學假設,正確的基本單位應是能量轉移而非瞬時力。社交 HRI(伴侶機器人、輔助系統、對話代理)主要是感知與互動設計問題,挑戰在於識別人類狀態(情緒、意圖、注意力)並生成人類認為適當而非僅是功能性的回應。
LLM 是近期最大的轉折點。 LLM 之前,機器人只有腳本式互動模式或狹義學習策略。LLM 實現了語境感知(從語言理解情境)、社會性對話和常識推理。CHI 2026 的 86 篇論文系統性回顧確認了這一轉變既真實又廣泛——但研究碎片化,缺乏標準化評估,且過度偏重技術能力,對用戶建模和適當自主性等以人為本的因素投入不足。
多模態感知是技術前沿。 只能看(或只能聽)的機器人從根本上受限。前沿是整合視覺、語言和觸覺——但這種規模的感測器融合很難。20 年回顧顯示各模態被孤立研究了幾十年;整合是當前的活躍未解問題。從實驗室到真實世界的泛化依然未能解決。
雙向性是互動設計的缺口。 多數現有 HRI 系統是指令執行管道:人類下指令,機器人執行。真正的互動需要雙向性——機器人能提出澄清問題、傳達不確定性、協商任務完成方式。VLN 調查將此識別為全領域缺失的關鍵能力。
信任與接受度缺乏理論基礎。 信任在幾乎每個 HRI 調查中都出現,但很少被明確操作化定義。恐懼反應(尤其是老年人對社交機器人的反應)被歸類為 7 個維度,包括恐怖谷效應、隱私擔憂和技術陌生感——但領域缺乏統一的信任如何在 HRI 中建立和崩潰的模型。
未解問題
- 對於在接觸豐富、不可預測情境中操作的靈巧機器人(21+ DOF 手),安全標準應如何演進,遠超出 ISO/TS 15066 設計的協作機器人裝配線情境?
- LLM 驅動機器人的正確自主程度是什麼?機器人何時應服從人類判斷,何時應自主行動?
- 如何建立測量人類體驗而非僅測量任務完成率的社交 HRI 評估基準?
- 主動澄清(機器人主導的消歧)能否在不令人惱火或干擾任務流程的情況下實現?
- 人機社交規範的文化差異如何影響相同機器人系統的全球部署?
報告
人機互動是機器人學中增長最快的領域之一,原因具有結構性:改變文字 AI 的同一場 LLM 革命,現在正透過 VLA 典範抵達機器人學,使機器人能理解自然語言、推理語境,並以腳本系統無法實現的方式與人類互動。但 HRI 並非單一領域——它橫跨物理互動安全、社交機器人設計、輔助技術、工業協作機器人和自主導航,這些線索具有不同的成熟度、不同的研究社群和不同的成功定義。
HRI 的物理面較為成熟,但正面臨監管的重新審視。ISO/TS 15066 規範機器人在製造業中能距人多近,是建立在未充分驗證的生物力學假設上的。2026 年對這些標準的批判性回顧認為,能量轉移(而非力或壓力)才是正確的安全指標,且常見設計近似引入了未量化的性能損失。ISO 10218-2 的 2025 年修訂反映了更深層的轉變:安全不再是機器人的屬性,而是應用的屬性。這對機器人的認證和部署方式有實際影響。
HRI 的社交與認知面較不成熟但發展更快。LLM 已成為自然互動的主導使能技術,CHI 2026 的 86 篇論文系統性回顧記錄了領域採用的速度——同時指出研究碎片化且以人為本的考量落後。核心洞察是 LLM 給了機器人以前所缺乏的東西:從語言理解語境的能力。聽到「把藍色杯子旁邊的東西遞給我」的機器人,能將其解析為有根據的空間指涉,而不是在從未訓練過的語法上失敗。
多模態感知將物理和社交線索聯繫在一起。真實環境中的機器人需要整合視覺、音訊、語言和觸覺來理解人類在做什麼及需要什麼。MPDDM 框架的 20 年調查顯示各模態的持續進展和增長的整合研究——但泛化依然是未解問題。在受控實驗室環境訓練的模型,部署到真實家庭、醫院或工廠時就會失敗。這與 VLA 研究中出現的差距(模擬 vs. 真實世界性能)相同,根本原因可能相同:訓練資料和部署環境之間的分佈偏移。
VLN 調查增加了純機器人研究中常常忽略的維度:互動是雙向的。人類不只是發出指令——他們根據機器人的行為調整指令,提出後續問題,並期望機器人在困惑時發出信號。現有系統不支持這一點。建構雙向 HRI 系統需要解決不確定性量化(機器人何時知道自己不知道?)、主動溝通(機器人如何在不令人惱火的情況下提問?)和對話中的話輪轉換(人機對話如何在多次交流中保持連貫性?)。
領域正處於轉折點。技術能力——LLM、多模態感知、靈巧操作——已大幅提升。現在的限制因素是以人為本的:信任、接受度、適當的自主性,以及衡量真正重要事項的評估方法論(人類喜歡和這個機器人共事嗎?),而非衡量容易測量的東西(機器人完成任務了嗎?)。