本文由 AI 分析生成
Summary
Article covering the LIMO (Less-Is-More Reasoning) hypothesis: Large Reasoning Models (LRMs) can learn complex chain-of-thought reasoning with surprisingly few examples. The LIMO model achieves 57.1% on AIME and 94.8% on MATH benchmarks with just 817 training samples via supervised fine-tuning + LoRA.
文章涵蓋 LIMO(少即是多推理)假設:大型推理模型可以用驚人少量的例子學習複雜的思維鏈推理。LIMO 模型僅用 817 個訓練樣本通過 SFT + LoRA,在 AIME 上達到 57.1%,在 MATH 上達到 94.8%。
Key Points
- LRMs (Large Reasoning Models) extend LLMs with long chain-of-thought training (reflection, backtracking, self-validation)
- LIMO hypothesis: complex reasoning abilities are latent in pretrained models; minimal supervised examples suffice to unlock them
- 817 training samples → 57.1% AIME, 94.8% MATH via SFT + LoRA
- Test-time computing: scaling inference effort (thinking more) rather than training scale is the new frontier
- Full LIMO suite is open-source (arxiv.org/abs/2501.11223)
Insights
The LIMO finding challenges the “more data = better” scaling assumption for reasoning specifically. If reasoning capability is already latent from pretraining, then the value of additional supervised examples is in directing the model toward that capability rather than teaching new skills. This has significant implications for fine-tuning costs and suggests that high-quality, diverse reasoning traces (showing different reasoning strategies) may be far more valuable than large volumes of mediocre examples.
Connections
Raw Excerpt
Large reasoning models (LRMs) are the latest frontier of LLMs, obtained with additional training, exploiting long chain-of-thoughts with reflection, backtracking, and self-validation to tackle challenging reasoning tasks.