Karpathy HN Time Capsule: Auto-grading 2015 Predictions with LLM Hindsight

本文由 AI 分析生成

建立時間： 2026-03-28 來源： https://karpathy.bearblog.dev/auto-grade-hn/

Summary

Andrej Karpathy built a “vibe-coded” project using Claude Opus 4.5 that fetches December 2015 Hacker News frontpages, submits each article + comment thread to GPT-5.1 Thinking for retrospective analysis, and displays a Hall of Fame of the most prescient HN commenters from that period. Total cost: $58 for 930 LLM queries covering 31 days × 30 articles.

Karpathy 用 Claude Opus 4.5「vibe code」出一個專案，抓取 2015 年 12 月 HN 首頁，用 GPT-5.1 Thinking 進行回顧分析，建立最有前瞻性評論者的名人堂。930 次 LLM 查詢，總費用 $58。

Key Points

Core task: for each 2015 HN article + comment thread, the LLM produces: summary, what actually happened, “most prescient” and “most wrong” awards, individual commenter grades (A-F), interestingness score
Historical snapshots analyzed: Swift open source (Dec 3), Figma launch (Dec 6), original OpenAI announcement (Dec 11), geohot building Comma (Dec 16), SpaceX Orbcomm-2 (Dec 22), Theranos struggles (Dec 28)
Build time: ~3 hours with Opus 4.5, repo at karpathy/hn-time-capsule
Two motivations: (1) forecasting skill is trainable — seeing who got things right/wrong is useful training data for your own mental models; (2) “future LLMs are watching” — intelligence becoming cheap enough to reconstruct and synthesize everything means current behavior will be scrutinized in ways we don’t anticipate
Vibe coding validation: author confirms the 3-hour build with an LLM was “relatively painless” and resulted in working code

Insights

The “future LLMs are watching” argument is the most philosophically interesting part. Karpathy frames it as a privacy/behavior shift: the implicit “security by obscurity” assumption in most current behavior (online comments, emails, public statements) breaks down when retrospective analysis becomes too cheap to meter.

The forecasting training angle is practical: reading a 2015 discussion about OpenAI’s announcement and seeing who predicted AGI timelines correctly (or not) is a direct training signal for calibrating your own predictions. This is a use of LLMs as epistemology tools, not just productivity tools.

The $58 cost for a decade of tech history analysis underscores how rapidly the economics of “analysis at scale” have shifted. The same project done manually by a researcher would take months. The limiting factor is now curation and interpretation, not raw analysis.

Connections

Raw Excerpt

Future LLMs are watching. Everything we do today might be scrutinized in great detail in the future because doing so will be “free”. A lot of the ways people behave currently make an implicit “security by obscurity” assumption. But if intelligence really does become too cheap to meter, it will become possible to do a perfect reconstruction and synthesis of everything.

bot_vault

Explorer

Karpathy HN Time Capsule: Auto-grading 2015 Predictions with LLM Hindsight

Summary

Key Points

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks