Microsoft Rho-alpha: VLA+ Model for Bimanual Manipulation

本文由 AI 分析生成

建立時間： 2026-03-27 來源： https://www.microsoft.com/en-us/research/story/advancing-ai-for-the-physical-world/

Summary

Microsoft Research announces Rho-alpha (ρα), their first robotics model derived from the Phi series of vision-language models. Positioned as “VLA+” — extends standard VLA perception with tactile sensing and continuous learning from human corrective feedback. Co-trained on physical demonstrations, synthetic data (via NVIDIA Isaac Sim on Azure), and web-scale VQA data. Targets bimanual manipulation on dual-arm and humanoid setups.

微軟研究院宣布 Rho-alpha（ρα），其首個衍生自 Phi 系列視覺語言模型的機器人模型。定位為「VLA+」——擴展標準 VLA 感知，加入觸覺感測和來自人類糾正反饋的持續學習。在物理示範、合成資料（NVIDIA Isaac Sim on Azure）和網絡規模 VQA 資料上聯合訓練。

Key Points

VLA+ definition: adds tactile sensing + force sensing (in progress) + continuous deployment-time learning from human feedback — beyond standard vision + language + action
Training pipeline: physical demos + NVIDIA Isaac Sim synthetic data + web VQA data — sim-to-real pipeline specifically for contact-rich tasks where physical teleoperation is impractical
Human-in-the-loop correction: human operators can correct via 3D mouse during deployment; these corrections feed back into model adaptation — similar in spirit to RoboCopilot
BusyBox benchmark: a new physical interaction benchmark introduced by Microsoft Research; Rho-alpha demo shows bimanual manipulation cued by natural language instructions
Current status: under evaluation on dual-UR5e-arm setups and humanoid robots; technical paper forthcoming; access via Research Early Access Program
Adaptability as goal: “robots that can adapt more easily to dynamic situations and to human preferences will be more useful and more trusted”

Insights

Rho-alpha’s “VLA+” framing distinguishes it from the standard VLA pattern (vision + language → action) by treating tactile sensing as a first-class input modality. This is a direct acknowledgment that current VLAs trained on video + language fail on contact-rich tasks because they lack force/contact information. The co-training with Isaac Sim synthetic data is the practical way to solve the data scarcity problem for rare contact scenarios — you can’t easily demonstrate robot-human contact recovery via teleoperation.

The continuous deployment-time learning from human feedback is the most interesting long-term claim — if it works at scale, it means deployed robots improve from operator corrections rather than requiring offline retrain cycles.

Connections

Raw Excerpt

Rho-alpha is a VLA+ model in that it expands the set of perceptual and learning modalities beyond those typically used by VLAs. For perception, Rho-alpha adds tactile sensing. For learning, we are working toward enabling Rho-alpha to continually improve during deployment by learning from feedback provided by people.

bot_vault

Explorer

Microsoft Rho-alpha: VLA+ Model for Bimanual Manipulation

Summary

Key Points

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks