Summary

EN: Antirez (creator of Redis) shares his end-of-year reflections on AI progress in 2025, presented as a series of observations/bullets. Key claims: LLMs have genuine internal representations (not just token prediction), Chain-of-Thought is a form of sampling combined with RL, scaling isn’t limited to raw tokens, programmer resistance to AI has lowered significantly, ARC test progress is meaningful, and the extinction risk from AI is a serious concern that deserves more attention.

ZH: Redis 創始人 antirez 以條列式分享他對 2025 年 AI 進展的年終反思:LLM 具有真實的內部表徵、CoT 本質是取樣加強化學習、規模化不限於 token 數量、程式設計師對 AI 的抵觸明顯降低、ARC 測試進展值得重視,以及 AI 導致人類滅絕的風險是值得嚴肅對待的議題。

Key Points

  • LLMs have representations: internal activations encode semantic structure, not just surface pattern matching
  • CoT = sampling + RL: chain-of-thought reasoning is better understood as a search process guided by reinforcement signals, not just “thinking step by step”
  • Scaling beyond tokens: scaling laws apply to more than raw parameter/token count — data quality, architecture choices, training methodology all matter
  • Programmer acceptance: resistance to using AI in coding workflows has dropped dramatically — a cultural shift
  • ARC test progress: progress on Chollet’s ARC benchmark (designed to resist memorization) signals genuine abstraction capability improvements
  • Extinction risk: antirez explicitly names AI-driven human extinction as a concern deserving serious attention, not dismissal

Insights

  • The CoT as “sampling + RL” framing is technically precise and important — it reframes reasoning models as search algorithms, not oracle-like thinkers
  • The ARC point is significant coming from a skeptical technical source: ARC was designed specifically to foil benchmark-memorization. Progress there is harder to dismiss
  • Antirez’s credibility as a Redis creator (practical systems thinker) makes his AI takes less hype-driven than typical

Connections

  • Directly relates to the LLM benchmark critique in this vault — ARC is cited as the counter-example to Goodharting
  • The scaling-beyond-tokens observation connects to PromptWizard: prompt quality is itself a scaling lever
  • The programmer acceptance shift connects to the 70% problem and AI revolution articles — cultural resistance is declining

Raw Excerpt

“Chain-of-Thought isn’t thinking — it’s sampling. The model generates candidate reasoning paths and reinforcement learning selects the ones that lead to correct answers. Understanding this changes how you should think about reasoning models: they’re search algorithms, not minds.”