LeRobot Field Report: ACT vs GR00T-N1 on Real SO-ARM100 — Practical Performance and Dataset Requirements

本文由 AI 分析生成

建立時間： 2026-04-05 來源： https://www.ml6.eu/en/blog/ai-robotics-a-field-report-on-imitation-learning-with-lerobot

Summary

ML6’s hands-on field report is the most practically grounded LeRobot evaluation available. They ran ACT and GR00T-N1 on SO-ARM100 arms across structured pick-and-place and deformable object tasks. ACT hits 90% on simple positional tasks with ~46k frames but fails on distribution shifts (different camera angles). GR00T-N1 handles more complex tasks (textile manipulation at 60–80%) but stutters due to inference latency. The central finding: data quality and curation matter more than model choice; loss curves don’t predict physical success.

ML6 在 SO-ARM100 上對 ACT 和 GR00T-N1 進行了最具實用價值的評估。ACT 在簡單位置任務上用 46k 幀達到 90% 成功率但無法泛化。GR00T-N1 處理更複雜任務（紡織操作 60-80%）但因推論延遲而抖動。核心發現：資料品質比模型選擇更重要。

Key Points

ACT on 5-position task: 90% success with 46k frames (~25 min teleoperation recording)
ACT failure mode: zero generalization to camera angle changes — brittle to distribution shift
GR00T-N1 on textile spreading: 60% success with 53k frames (29 min)
GR00T-N1 on towel folding: 80% with 76k frames (42 min)
GR00T-N1 failure mode: inference latency causes stuttering motion — addressed in N1.5 + async inference
Dataset curation rule: 4 factors — accuracy, controlled sequences, comprehensive coverage, robustness (include error recovery)
Evaluation challenge: loss does NOT correlate with physical success; mm-level errors cause manipulation failure

Insights

The data recording time numbers are key practical constraints: 25–42 minutes of teleoperation to get usable performance. This is accessible but requires skilled operators — bad demonstrations hurt more than fewer good ones.

The fact that ACT fails on camera angle changes while GR00T-N1 handles deformable objects suggests the two models occupy different niches: ACT for precise, repetitive tasks in fixed setups; VLAs for tasks requiring semantic understanding or handling physical variation.

ML6 placed 3rd in the 2025 LeRobot Hackathon — using Gaussian splatting to handle camera instability. This is a real production pattern for stabilizing visual observations.

資料錄製時間是關鍵實際約束：25-42 分鐘遙操作可達到可用性能。ACT 失敗於相機角度變化，而 GR00T-N1 能處理可變形物體，兩個模型佔據不同應用場景：ACT 適合固定環境精確重複任務，VLA 適合需要語義理解或物理變化的任務。

Connections

Raw Excerpt

Imitation learning is “closer than most expect” for production robotics in controlled environments with repetitive, structured tasks.

bot_vault

Explorer

LeRobot Field Report: ACT vs GR00T-N1 on Real SO-ARM100 — Practical Performance and Dataset Requirements

Summary

Key Points

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks