本文由 AI 分析生成
建立時間: 2025-01-01
Summary
EN: A practitioner’s account of iterating through multiple robot learning approaches with a SO-100 arm. The author tried: PPO (reinforcement learning, sim2real with IsaacLab), FPO (flow policy, struggled with exploding gradients), FPO++ (improved version), ACT (Action Chunking with Transformers, 50 demonstration episodes), and SmolVLA fine-tuning (vision-language-action model). The article reflects on the common underlying principles across all methods despite the rapidly changing landscape.
ZH: 作者記錄以 SO-100 機械手臂實際嘗試多種機器人學習方法的歷程:PPO(強化學習 + IsaacLab sim2real)、FPO(流策略,遇到梯度爆炸問題)、FPO++(改進版)、ACT(動作分塊 Transformer,使用 50 段示範資料),以及 SmolVLA 微調(視覺語言動作模型)。文章反思在快速變化的技術景觀下各方法共通的底層原則。
Key Points
- PPO: model-free RL, requires careful sim environment design for sim2real transfer; IsaacLab used for simulation
- FPO (Flow Policy): continuous action distribution via normalizing flows; author encountered exploding gradient issues
- FPO++: improved stability over FPO; resolved gradient issues
- ACT: 50 demonstration episodes sufficient for simple manipulation tasks; chunked actions improve smoothness
- SmolVLA: HuggingFace’s small VLA model; fine-tunable with modest compute
- The “same three things” framing: perception, planning, and action remain the core challenges regardless of method
Insights
- 50 episodes for ACT is surprisingly few — the implication is that demonstration quality matters more than quantity for simple tasks
- The progression from RL → flow policies → VLAs reflects the field’s broader trajectory: from hand-crafted rewards to imitation to language-conditioned generalization
- Exploding gradients in FPO on real robot data suggest that flow policies are more sensitive to distribution shift than RL methods
Connections
- RH20T dataset in this vault: the data collection challenge ACT faces (needing demos) is exactly what RH20T addresses at scale
- Sunday.ai Memo robot: also uses imitation learning via a specialized data capture glove
- Open source robotics stack article: ACT is part of LeRobot library, which Arne Baeyens recommends
Raw Excerpt
“After going through PPO, FPO, FPO++, ACT, and SmolVLA in a few months, I realized the tools change constantly but the problem doesn’t: you need the robot to perceive its environment, plan an action, and execute it reliably. Every new method is just a different answer to the same three questions.”