本文由 AI 分析生成
建立時間: 2026-04-03 來源: https://x.com/karpathy/status/2039805659525644595
Summary
Andrej Karpathy describes a personal workflow where LLMs act as compilers: raw source documents (articles, papers, repos, images) are ingested into a raw/ directory, and an LLM incrementally “compiles” them into a structured markdown wiki — writing summaries, backlinks, concept articles, and inter-document links. The wiki lives in Obsidian for human browsing, and once large enough (~100 articles, ~400K words), the same LLM can answer complex questions by reading its own index files. Outputs (Q&A results, slides, visualizations) are filed back into the wiki, so every query compounds the knowledge base.
Karpathy 描述一種以 LLM 為「編譯器」的個人知識管理工作流:原始文件存入 raw/ 目錄,LLM 自動將其編譯成結構化的 Markdown wiki(附摘要、反向連結、概念文章)。Wiki 夠大後(約 100 篇、40 萬字),LLM 就能透過讀取自己的索引檔案回答複雜問題。每次查詢的輸出也歸檔回 wiki,形成複利式知識積累。
Key Points
- Three-layer architecture:
raw/(source docs) → wiki (LLM-compiled markdown) → outputs (Q&A, slides, charts) - LLM maintains the wiki autonomously; the human rarely edits it directly
- At ~100 articles scale, a standard LLM can handle Q&A without fancy RAG — auto-maintained index files + brief summaries are sufficient for retrieval
- “Linting” pass: LLM health-checks the wiki for inconsistencies, imputes missing data via web search, and proposes new article candidates
- Obsidian serves as the read-only frontend; Marp for slides; matplotlib for chart outputs
- Long-term direction: synthetic data generation + fine-tuning so the LLM “knows” the wiki in weights, not just context
Insights
The key architectural insight is that at small-to-medium scale (~400K words), a capable LLM with good index files doesn’t need vector embeddings or a retrieval pipeline — it can navigate a well-organized markdown directory structure directly. This inverts the common assumption that RAG is necessary for any non-trivial knowledge base.
The compounding loop is the underrated part: outputs filed back into the wiki mean every query permanently enriches the base rather than disappearing into chat history. This transforms Q&A from a stateless interaction into an investment.
The “vibe coded search engine” comment suggests that even simple keyword search over the wiki, exposed as a CLI tool, is valuable enough to build — which implies the value is in organization and persistence, not retrieval sophistication.
Connections
- karpathy is showing one of the simplest AI architectures that actually works… — JUMPERZ’s thread directly comments on this post and extrapolates to multi-agent architectures
- The NotebookLM Workflow That Changed How I Learn Any Technology — NotebookLM implements a similar triangulated-source + AI synthesis approach, but as a hosted product rather than a local workflow
- knowledge-management
- obsidian
- personal-wiki
Raw Excerpt
TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it’s the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.