SmolVLA: 450M VLA Model with 78.3% SO-100 Success Rate — Accessible Fine-Tuning for Real Robots

本文由 AI 分析生成

建立時間： 2026-04-05 來源： https://huggingface.co/docs/lerobot/en/smolvla

Summary

SmolVLA is Hugging Face’s answer to the “too small to pretrain, too large to ignore” problem. At 450M parameters it sits between ACT (52M) and π₀ (3.5B), pretraining on 481 community-contributed SO-100 datasets (~23k episodes, 10.6M frames) before fine-tuning on task-specific data. The pretraining bump is substantial: 51.7% → 78.3% on SO-100 pick-place tasks. Fine-tuning requires ~50 episodes and ~4 hours on a single A100. Inputs are multi-view cameras + proprioception + language instruction; output is an action chunk.

SmolVLA 以 450M 參數介於 ACT（52M）和 π₀（3.5B）之間。在 481 個社群貢獻的 SO-100 資料集上預訓練後，SO-100 拾放任務成功率從 51.7% 提升到 78.3%。微調只需 ~50 個 episode 和單張 A100 上 ~4 小時。

Key Points

Architecture: multi-view cameras + proprioception state + language instruction → action expert → action chunk
Pretraining data: 481 datasets, ~23k episodes, 10.6M frames — primarily SO-100 demonstrations
Fine-tuning cost: ~50 episodes minimum; 20k steps; ~4h on A100; also available on Colab
Minimum viable dataset: 50 episodes per task; 25 was insufficient; ~10 per variation
Inference command: lerobot-record --policy.path=user/smolvla_finetuned
Improvement from pretraining: +26.6 percentage points over task-specific training alone

Insights

The 50-episode requirement means ~25–50 minutes of teleoperation per new task — practical for lab settings. The key design insight is that pretraining on community data from the same hardware platform (SO-100) transfers strongly. This is different from cross-embodiment transfer where gains are less reliable.

The Colab fine-tuning path matters: it means researchers without A100 access can still fine-tune SmolVLA, lowering the barrier further.

50 個 episode 要求意味每個新任務約 25-50 分鐘的遙操作 — 對實驗室環境可行。在相同硬體平台（SO-100）的社群資料上預訓練遷移效果強，這與跨體態遷移不同（後者效果較不穩定）。

Connections

Raw Excerpt

Pretraining SmolVLA on a corpus of community datasets led to a substantial improvement in real-world performance on the SO-100 robot benchmark, elevating success rates from 51.7% to 78.3%.

bot_vault

Explorer

SmolVLA: 450M VLA Model with 78.3% SO-100 Success Rate — Accessible Fine-Tuning for Real Robots

Summary

Key Points

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks