Summary

Andrej Karpathy describes a personal workflow where LLMs act as compilers: raw source documents (articles, papers, repos, images) are ingested into a raw/ directory, and an LLM incrementally “compiles” them into a structured markdown wiki — writing summaries, backlinks, concept articles, and inter-document links. The wiki lives in Obsidian for human browsing, and once large enough (~100 articles, ~400K words), the same LLM can answer complex questions by reading its own index files. Outputs (Q&A results, slides, visualizations) are filed back into the wiki, so every query compounds the knowledge base.

Karpathy 描述一種以 LLM 為「編譯器」的個人知識管理工作流:原始文件存入 raw/ 目錄,LLM 自動將其編譯成結構化的 Markdown wiki(附摘要、反向連結、概念文章)。Wiki 夠大後(約 100 篇、40 萬字),LLM 就能透過讀取自己的索引檔案回答複雜問題。每次查詢的輸出也歸檔回 wiki,形成複利式知識積累。

Key Points

  • Three-layer architecture: raw/ (source docs) → wiki (LLM-compiled markdown) → outputs (Q&A, slides, charts)
  • LLM maintains the wiki autonomously; the human rarely edits it directly
  • At ~100 articles scale, a standard LLM can handle Q&A without fancy RAG — auto-maintained index files + brief summaries are sufficient for retrieval
  • “Linting” pass: LLM health-checks the wiki for inconsistencies, imputes missing data via web search, and proposes new article candidates
  • Obsidian serves as the read-only frontend; Marp for slides; matplotlib for chart outputs
  • Long-term direction: synthetic data generation + fine-tuning so the LLM “knows” the wiki in weights, not just context

Insights

The key architectural insight is that at small-to-medium scale (~400K words), a capable LLM with good index files doesn’t need vector embeddings or a retrieval pipeline — it can navigate a well-organized markdown directory structure directly. This inverts the common assumption that RAG is necessary for any non-trivial knowledge base.

The compounding loop is the underrated part: outputs filed back into the wiki mean every query permanently enriches the base rather than disappearing into chat history. This transforms Q&A from a stateless interaction into an investment.

The “vibe coded search engine” comment suggests that even simple keyword search over the wiki, exposed as a CLI tool, is valuable enough to build — which implies the value is in organization and persistence, not retrieval sophistication.

Connections

Raw Excerpt

TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it’s the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.