Summary

SafeVLA (NeurIPS 2025 Spotlight) is the VLA-era counterpart to SafeDreamer — both from PKU-Alignment, both based on CMDP constrained RL. While SafeDreamer enforces safety constraints in a DreamerV3 latent world model, SafeVLA enforces them directly in a VLA policy. The Integrated Safety Approach (ISA) systematically elicits unsafe VLA behaviors, then constrains the policy against them via safe RL. 83.58% reduction in safety violations; maintains task performance.

SafeVLA(NeurIPS 2025 Spotlight)是 VLA 時代的 SafeDreamer 對應版本——兩者均來自 PKU-Alignment,均基於 CMDP 約束強化學習。SafeDreamer 在 DreamerV3 潛在世界模型中強制執行安全約束,SafeVLA 則直接在 VLA 策略中強制執行。整合安全方法(ISA)系統性地誘發不安全的 VLA 行為,然後透過安全強化學習對其進行約束。安全違規減少 83.58%,同時保持任務性能。

Key Points

  • CMDP paradigm: reward maximization subject to cumulative safety cost constraint — same framework as SafeDreamer
  • Unsafe behavior elicitation: proactively generates failure modes the model hasn’t encountered, improving constraint coverage
  • ISA pipeline: requirements → elicitation → constrained training → targeted evaluation
  • Benchmark: Safety-CHORES — long-horizon mobile manipulation tasks with diverse safety requirements
  • Spotlight at NeurIPS 2025: signals community consensus that VLA safety alignment is a mature research direction

Insights

SafeVLA is the natural synthesis of two prior lines: (1) CMDP-based safe RL for traditional RL agents (SafeDreamer), and (2) VLA foundation models as robot policies. By combining them, SafeVLA allows the VLA’s semantic understanding to inform what constitutes unsafe behavior — something a pure reward/cost function cannot capture.

The unsafe behavior elicitation step is noteworthy: instead of relying on environment rollouts to discover violations, it actively adversarially generates them. This is the VLA equivalent of Parmar 2026’s “safety probing” concept.

Connections