本文由 AI 分析生成
Summary
Wen explains database indexes through the lens of a job interview experience — specifically how he was asked about index internals and couldn’t fully answer. The article traces the evolution from BST → AVL → B-Tree → B+ Tree, explaining why B+ Tree is the default structure for MySQL/InnoDB indexes, and covers when to use indexes, index failure scenarios, and the practical optimization of keeping internal nodes in memory.
Wen 透過面試經歷解釋資料庫索引——他被問到索引原理卻無法完整作答。文章追溯從 BST → AVL → B-Tree → B+ Tree 的演進,解釋為何 B+ Tree 是 MySQL/InnoDB 索引的預設結構,並涵蓋何時使用索引、索引失效情況,以及將非葉子節點保留在記憶體中的實際優化。
Key Points
- BST degenerates to linked list for sequential inserts (O(n) worst case); AVL fixes balance but tall trees still have high O(h)
- B+ Tree advantages: data only at leaf nodes (stable O(logN)), leaves linked for range queries, internal nodes store only keys (more keys per node, fits in memory), binary search at each node
- MySQL/InnoDB optimization: keeps non-leaf nodes in memory — only leaf node access requires I/O, dramatically improving performance for large datasets
- When NOT to index: write-heavy tables (index maintenance overhead); columns with low cardinality; small tables
- Index failure scenarios:
!=/<>operators; implicit type conversion (varchar column queried as int); function operations on indexed columns; leading wildcard in LIKE; violating leftmost prefix rule in composite indexes
Insights
The “keep internal nodes in memory” optimization is the killer insight that makes B+ Tree practical for databases: because B+ Tree stores all actual data at leaves, the internal nodes are pure index overhead — and they’re small enough to fit in RAM. This means most index lookups require exactly one I/O (the final leaf read), regardless of table size. This only works because B+ Tree separates data from navigation, which B-Tree cannot do.
Connections
Raw Excerpt
在生產環境中的 MySQL,會將非葉子節點放在記憶體裡,所以在索引正常運作的情況下,會先在記憶體中運算,只有到最後的葉子節點,才會使用到 I/O。