Summary

Simon Willison’s comprehensive third annual LLM review covers the defining trends of 2025: reasoning models powered by RLVR, the breakout of coding agents led by Claude Code, dominance of Chinese open-weight models, the risks of AI-enabled browsers, and new threat terminology including “lethal trifecta” for prompt injection attacks. The article offers a practitioner’s perspective from someone who ships LLM-powered tools weekly.

Simon Willison 第三年度的 LLM 回顧全面涵蓋 2025 年重大趨勢:RLVR 推理模型、Claude Code 引領的編碼 Agent 爆發、中國開源模型稱霸、AI 瀏覽器安全風險,以及新術語「致命三要素」描述 Prompt Injection 攻擊威脅。

Key Points

  • Reasoning (RLVR) became mainstream: every major lab released at least one reasoning model
  • Claude Code launched quietly bundled in Claude 3.7 announcement; became year’s most impactful product
  • All major labs released CLI coding agents: Claude Code, Codex CLI, Gemini CLI, Qwen Code, Mistral Vibe
  • Chinese open-weight models topped Artificial Analysis leaderboard: DeepSeek, Kimi, GLM, MiniMax
  • “Lethal trifecta”: prompt injection + exfiltration vector + private data = critical agent security flaw
  • MCP emerged and may be short-lived; bash-as-tool and Skills format proposed as simpler alternatives
  • Slop became Merriam-Webster word of the year; AI data centers faced growing public opposition

Insights

Willison’s observation that “agents as LLMs running tools in a loop” unblocked productive conversation is a useful definitional move. His concern about AI browsers is well-founded: these are the highest-risk surface area because they have access to the most sensitive user data (passwords, banking, email). The normalization of YOLO mode in coding agents is a slow-moving safety crisis he flags clearly — the longer it goes without incident, the more normalized and thus dangerous it becomes.

Connections

Raw Excerpt

The most impactful event of 2025 happened in February, with the quiet release of Claude Code. I say quiet because it didn’t even get its own blog post! Anthropic bundled the Claude Code release in as the second item in their post announcing Claude 3.7 Sonnet.