本文由 AI 分析生成
建立時間: 2026-03-28 來源: https://bostondynamics.com/blog/large-behavior-models-atlas-find-new-footing/
Summary
Boston Dynamics and Toyota Research Institute (TRI) joint blog post on Large Behavior Models (LBMs) for the Atlas humanoid robot. End-to-end language-conditioned policies using a 450M-parameter Diffusion Transformer enable long-horizon manipulation + locomotion tasks, with inference-time speedup (1x→2-3x) and reactive failure recovery without retraining.
Boston Dynamics 和豐田研究所(TRI)關於 Atlas 人形機器人大行為模型(LBM)的聯合博客文章。使用 4.5 億參數擴散 Transformer 的端到端語言條件策略使長視野操作和移動任務成為可能,具有推理時加速(1x→2-3x)和無需重新訓練的反應性故障恢復。
Key Points
- Architecture: 450M-parameter Diffusion Transformer + flow-matching; conditioned on proprioception + images + language prompt; predicts action chunks of 48 steps (1.6s) at 30Hz
- Action space: gripper joints, neck yaw, torso pose, hand poses, foot poses (full body for Atlas; upper-body for Atlas MTS)
- Shared upper body between Atlas and Atlas MTS enables multi-embodiment co-training; data pooled across both platforms
- Inference-time speedup: policies predict temporal action chunks; timing can be adjusted to 1.5-3x without retraining (when task dynamics permit)
- Reactive recovery: collected failure demonstrations + retrained → new reactive policy with no algorithmic changes; shows data-driven recovery without explicit programming
- Teleoperation: VR headset with full-body tracking (head+hands+feet); 1:1 mapping; later added foot trackers for mobile manipulation (crouching, wide stance)
- Tasks: rope tying, tablecloth spreading, 22lb tire manipulation, screwdriving — deformable/heavy objects that are intractable for traditional programming
- Simulation used for iteration + co-training data source for multi-task policies
Insights
The “if you can demonstrate it, the robot can learn it” claim is the most significant operational implication: LBMs remove the need for task-specific programming expertise. The inference-time speedup via action-chunk timing adjustment is an elegant free lunch — collecting data at human demonstration speed but executing faster at deployment. The reactive recovery through demonstration data is notable: it means failure mode handling is a data problem, not an engineering problem. This generalizes the sim-to-real insight: if your policy can’t recover from a dropped part, show it recovering, not program the recovery.
Connections
Raw Excerpt
Programming new manipulation behaviors no longer requires an advanced degree and years of experience. If you can demonstrate it, the robot can learn it — whether it’s stacking rigid blocks or folding a t-shirt, the training process is the same.