Uncertainty-Aware Robotic World Model Makes Offline MBRL Work on Real Robots

本文由 AI 分析生成

建立時間： 2025-04-22 來源： https://arxiv.org/abs/2504.16680

Summary

RWM-U (ETH Zurich, 2025) extends autoregressive robotic world models with ensemble-based epistemic uncertainty estimation. The key safety insight: by penalizing imagined transitions where the world model is uncertain (MOPO-PPO), the policy avoids regions of the state space not covered by the offline dataset — which corresponds directly to avoiding dangerous or untested configurations. Demonstrated on real quadruped (ANYmal) and humanoid hardware for manipulation and locomotion tasks.

RWM-U（ETH Zurich, 2025）透過集成式認識不確定性估計擴展自回歸機器人世界模型。核心安全洞察：透過懲罰世界模型不確定的模擬轉換（MOPO-PPO），策略避免了離線資料集未覆蓋的狀態空間區域——直接對應於避免危險或未測試的配置。在真實四足機器人（ANYmal）和人形機器人硬體上針對操作和運動任務進行了驗證。

Key Points

RWM architecture: autoregressive transformer world model predicting next state token-by-token from action and history
Uncertainty via ensembles: train N world models; disagreement between predictions = epistemic uncertainty signal
MOPO-PPO: adapts the Model-based Offline Policy Optimization framework to PPO; penalizes reward with uncertainty estimate during imagined rollouts
Real-robot deployment: unlike most offline RL papers, RWM-U is actually deployed on ANYmal quadruped and humanoid hardware
Key problem solved: compounding errors in long-horizon rollouts — uncertainty propagation prevents the model from confidently predicting into regions it has never seen

Insights

The uncertainty penalization is functionally equivalent to a data-driven safety barrier: the offline dataset defines “known safe” states, and the uncertainty signal prevents the policy from venturing beyond it. This is a weaker form of safety than formal constraint satisfaction (no hard guarantees) but is far more practical for real hardware deployment.

The ETH Zurich group (Marco Hutter lab) is notable for consistently deploying learned policies on actual legged robots — the paper’s real-hardware results make it especially credible.

Connections

Clippings-safedreamer-safe-reinforcement-learning-world-models — complementary: SafeDreamer enforces hard Lagrangian constraints; RWM-U enforces soft uncertainty penalties
world-models
offline-rl
robotics
uncertainty

bot_vault

Explorer

Uncertainty-Aware Robotic World Model Makes Offline MBRL Work on Real Robots

Summary

Key Points

Insights

Connections

Graph View

Table of Contents

Backlinks