本文由 AI 分析生成
建立時間: 2026-03-27 來源: https://www.microsoft.com/en-us/research/story/advancing-ai-for-the-physical-world/
Summary
Microsoft Research announces Rho-alpha (ρα), their first robotics model derived from the Phi series of vision-language models. Positioned as “VLA+” — extends standard VLA perception with tactile sensing and continuous learning from human corrective feedback. Co-trained on physical demonstrations, synthetic data (via NVIDIA Isaac Sim on Azure), and web-scale VQA data. Targets bimanual manipulation on dual-arm and humanoid setups.
微軟研究院宣布 Rho-alpha(ρα),其首個衍生自 Phi 系列視覺語言模型的機器人模型。定位為「VLA+」——擴展標準 VLA 感知,加入觸覺感測和來自人類糾正反饋的持續學習。在物理示範、合成資料(NVIDIA Isaac Sim on Azure)和網絡規模 VQA 資料上聯合訓練。
Key Points
- VLA+ definition: adds tactile sensing + force sensing (in progress) + continuous deployment-time learning from human feedback — beyond standard vision + language + action
- Training pipeline: physical demos + NVIDIA Isaac Sim synthetic data + web VQA data — sim-to-real pipeline specifically for contact-rich tasks where physical teleoperation is impractical
- Human-in-the-loop correction: human operators can correct via 3D mouse during deployment; these corrections feed back into model adaptation — similar in spirit to RoboCopilot
- BusyBox benchmark: a new physical interaction benchmark introduced by Microsoft Research; Rho-alpha demo shows bimanual manipulation cued by natural language instructions
- Current status: under evaluation on dual-UR5e-arm setups and humanoid robots; technical paper forthcoming; access via Research Early Access Program
- Adaptability as goal: “robots that can adapt more easily to dynamic situations and to human preferences will be more useful and more trusted”
Insights
Rho-alpha’s “VLA+” framing distinguishes it from the standard VLA pattern (vision + language → action) by treating tactile sensing as a first-class input modality. This is a direct acknowledgment that current VLAs trained on video + language fail on contact-rich tasks because they lack force/contact information. The co-training with Isaac Sim synthetic data is the practical way to solve the data scarcity problem for rare contact scenarios — you can’t easily demonstrate robot-human contact recovery via teleoperation.
The continuous deployment-time learning from human feedback is the most interesting long-term claim — if it works at scale, it means deployed robots improve from operator corrections rather than requiring offline retrain cycles.
Connections
Raw Excerpt
Rho-alpha is a VLA+ model in that it expands the set of perceptual and learning modalities beyond those typically used by VLAs. For perception, Rho-alpha adds tactile sensing. For learning, we are working toward enabling Rho-alpha to continually improve during deployment by learning from feedback provided by people.