Summary
A translated and annotated summary of Simon Willison’s “2025: The Year in LLMs,” covering the three defining themes of 2025: the rise of reasoning models (RLVR), the maturation of AI agents (especially coding agents), and the emergence of Claude Code as the year’s most influential product launch. The piece argues that 2025’s capability gains came primarily from longer RL training rather than larger model scale, and that coding agents and AI-assisted search are the two proven agent deployment patterns.
對 Simon Willison 年度 LLM 回顧的中文翻譯與整理,涵蓋 2025 年三大主題:推理模型(RLVR)的崛起、AI Agent 的落地成熟(特別是編程 Agent),以及 Claude Code 作為年度最具影響力產品的誕生。文章指出 2025 年的能力進步主要來自更長的 RL 訓練而非更大的模型規模,編程 Agent 和 AI 輔助搜索是兩個已驗證的 Agent 落地場景。
Key Points
- 2025 = Year of Reasoning: o1 → o3/o3-mini/o4-mini; RLVR (Reinforcement Learning from Verifiable Rewards) is the core technique
- Capability gains shifted from pre-training scale to RL training length — compute redirected from pre-training to RL
- Reasoning models’ real value: multi-step tool use (plan → execute → observe → adjust), not just logic puzzles
- AI-assisted search finally works: GPT-5 Thinking-style systems answer complex research questions reliably
- Agent definition settled: “LLM systems that accomplish useful work through multi-step tool calls” — not AGI, but genuinely useful
- Claude Code launched quietly in Feb (bundled in Claude 3.7 Sonnet announcement) — became the year’s most influential product
- Major CLI coding agents: Claude Code, Codex CLI, Gemini CLI, Qwen Code, Mistral Vibe, plus vendor-neutral options (Amp, OpenCode, OpenHands CLI)
- Two proven agent deployment patterns: coding and deep search
Insights
- “Almost all capability progress in 2025 came from longer RL training, not larger model scale” — this reframes the compute narrative; the scaling law debate shifted from parameters to training regime
- Claude Code launching without a dedicated blog post and still becoming the year’s defining product is a case study in letting the product speak — the engineering community discovered it organically
- The version numbering skip (3.5 → 3.7, skipping 3.6 because the community named the silent upgrade) is a minor but telling detail about how fast the field moves and how organic naming conventions emerge
- The author’s own prediction failure (“agents won’t land in 2025”) followed by updating the definition rather than the prediction is intellectually honest and worth noting as a modeling approach
- Deep research mode (15+ minutes for detailed reports) becoming obsolete within a year because better systems could match quality in seconds shows how quickly “impressive demos” can become baseline expectations
Connections
- Claude Code
- AI Agents
- Reinforcement Learning
- Reasoning Models
- Lessons from Building Claude Code How We Use Skills
- 你不知道的 Claude Code:架构、治理与工程实践
- AI Agents 101
Raw Excerpt
2025 年最具影响力的大事,是 2 月 Anthropic 静悄悄地发布了 Claude Code,甚至没单独发博客,只是夹在 Claude 3.7 Sonnet 的公告里。