本文由 AI 分析生成
建立時間: 2026-03-27 來源: https://arxiv.org/abs/2306.03310
Summary
LIBERO is a benchmark for lifelong robot learning that separates two types of knowledge transfer: declarative (object/spatial concepts) and procedural (actions/behaviors). It provides 130 tasks across 4 task suites, plus high-quality human-teleoperated demos. Three counterintuitive findings: sequential finetuning beats dedicated lifelong learning algorithms, no single visual encoder dominates across knowledge types, and naive supervised pretraining can hurt downstream performance.
LIBERO 是終身機器人學習基準,區分兩種知識遷移:宣告式(物體/空間概念)和程序式(動作/行為)。提供 130 個任務跨 4 個任務套件與高品質人類遠端操控示範。三個反直覺發現:順序微調優於專用終身學習演算法、沒有單一視覺編碼器在所有知識類型上都表現最佳、樸素監督預訓練可能損害後續表現。
Prerequisites
- Lifelong / continual learning — the benchmark is built around the challenge of learning new tasks without forgetting old ones (catastrophic forgetting); understanding this trade-off between plasticity and stability is foundational
- Declarative vs. procedural knowledge — declarative: facts about the world (what objects exist, where); procedural: how to act (motor skills, behaviors). LIBERO’s 4 suites isolate these for controlled study
- Behavior cloning / imitation learning — all baselines train visuomotor policies from human demonstration data; knowing BC is the standard approach helps contextualize why pretraining effects are surprising
Core Idea
LIBERO’s key design insight is that prior lifelong learning benchmarks in vision/NLP study only declarative knowledge transfer, but robot manipulation requires both declarative and procedural knowledge — and they may transfer differently. The 4-suite structure creates controlled experiments: LIBERO-Spatial (vary spatial layout, same objects/goals), LIBERO-Object (vary objects), LIBERO-Goal (vary goals), and LIBERO-100 (entangled, real-world complexity). This allows clean ablations of which knowledge type a given algorithm actually transfers. LIBERO-90/10 further supports studying pretraining + fine-tuning, which is the dominant paradigm in modern robot learning.
Results
Task suites:
| Suite | Tasks | Knowledge type | Description |
|---|---|---|---|
| LIBERO-Spatial | 10 | Declarative (spatial) | Same objects, vary spatial layout |
| LIBERO-Object | 10 | Declarative (object) | Same spatial layout, vary objects |
| LIBERO-Goal | 10 | Procedural (goal) | Same scene, vary task goal |
| LIBERO-100 | 100 | Entangled | Realistic mixed distribution |
| — LIBERO-90 | 90 | Pretraining split | Used to pretrain a policy |
| — LIBERO-10 | 10 | Downstream eval | Tests lifelong learning after pretraining |
Key findings (qualitative — paper does not provide a single summary table):
- Sequential finetuning > lifelong learning algorithms in forward transfer — the overhead of EWC, PackNet, etc. doesn’t pay off
- No single visual encoder wins across all suites — ResNet, ViT, and other architectures each have strengths/weaknesses depending on knowledge type
- Naive pretraining on LIBERO-90 can hurt LIBERO-10 performance — pretraining on too-similar data biases the policy, interfering with new task acquisition
Context in the field (as of 2026): LIBERO is considered “nearly saturated” — SOTA approaches score 95%+ on the standard suites, meaning new papers claiming 96-98% are not meaningfully differentiated. This is noted in the State of VLA Research at ICLR 2026.
Limitations
- Author-stated: benchmark is limited to tabletop manipulation with a fixed arm (no mobile manipulation, no humanoids)
- Author-stated: procedural generation pipeline is currently constrained to kitchen/tabletop objects
- Unstated: by 2026 the standard suites are saturated — LIBERO-100 (especially LIBERO-10) remains more discriminative but is less commonly reported
- Unstated: all tasks use a single camera viewpoint and fixed scene structure — doesn’t test visual generalization across viewpoints or lighting
Reproducibility
- Code: open-source at https://github.com/Lifelong-Robot-Learning/LIBERO
- Datasets: human teleoperation demos for all 130 tasks, downloadable via script or HuggingFace (
yifengzhu-hf/LIBERO-datasets) - Compute: PyTorch with CUDA; compatible with standard GPU setups (paper uses 1-2 GPUs for BC baselines)
Insights
The most practically important finding is that sequential finetuning beats dedicated lifelong learning methods — this suggests that the lifelong learning research community has been optimizing for a problem (forgetting) that may not be the binding constraint in robot learning. Instead, forward transfer (learning new tasks efficiently given prior experience) matters more, and simple finetuning handles this adequately at this scale.
The pretraining-hurts result is subtle but important: it happens because pretraining on LIBERO-90 creates a strong prior toward behaviors that are similar but not identical to LIBERO-10 tasks, causing negative transfer. This is a warning for the broader trend of “pretrain on everything, fine-tune on target” — the domain shift between pretraining and target tasks needs to be managed carefully.
In 2026 LIBERO is the standard benchmark for VLA evaluation — almost every paper reports it. The saturation problem means LIBERO-100 and particularly LIBERO-10 (the lifelong learning track) are the remaining discriminative splits.