Summary

Wen explains database indexes through the lens of a job interview experience — specifically how he was asked about index internals and couldn’t fully answer. The article traces the evolution from BST → AVL → B-Tree → B+ Tree, explaining why B+ Tree is the default structure for MySQL/InnoDB indexes, and covers when to use indexes, index failure scenarios, and the practical optimization of keeping internal nodes in memory.

Wen 透過面試經歷解釋資料庫索引——他被問到索引原理卻無法完整作答。文章追溯從 BST → AVL → B-Tree → B+ Tree 的演進,解釋為何 B+ Tree 是 MySQL/InnoDB 索引的預設結構,並涵蓋何時使用索引、索引失效情況,以及將非葉子節點保留在記憶體中的實際優化。

Key Points

  • BST degenerates to linked list for sequential inserts (O(n) worst case); AVL fixes balance but tall trees still have high O(h)
  • B+ Tree advantages: data only at leaf nodes (stable O(logN)), leaves linked for range queries, internal nodes store only keys (more keys per node, fits in memory), binary search at each node
  • MySQL/InnoDB optimization: keeps non-leaf nodes in memory — only leaf node access requires I/O, dramatically improving performance for large datasets
  • When NOT to index: write-heavy tables (index maintenance overhead); columns with low cardinality; small tables
  • Index failure scenarios: != / <> operators; implicit type conversion (varchar column queried as int); function operations on indexed columns; leading wildcard in LIKE; violating leftmost prefix rule in composite indexes

Insights

The “keep internal nodes in memory” optimization is the killer insight that makes B+ Tree practical for databases: because B+ Tree stores all actual data at leaves, the internal nodes are pure index overhead — and they’re small enough to fit in RAM. This means most index lookups require exactly one I/O (the final leaf read), regardless of table size. This only works because B+ Tree separates data from navigation, which B-Tree cannot do.

Connections

Raw Excerpt

在生產環境中的 MySQL,會將非葉子節點放在記憶體裡,所以在索引正常運作的情況下,會先在記憶體中運算,只有到最後的葉子節點,才會使用到 I/O。