Summary

This article surveys the core data structures that underpin modern database engines — from Hash Indexes to B-Trees, LSM Trees, and beyond — explaining the tradeoffs that lead database designers to choose one over another. Despite thousands of different databases existing, only a handful of foundational structures power them all.

本文介紹驅動現代資料庫引擎的核心資料結構,從雜湊索引到 B-Tree、LSM Tree 等,說明各自的取捨與適用場景。儘管資料庫種類繁多,底層結構卻只有少數幾種。

Key Points

  • Hash Indexes: in-memory hash map with disk pointers; O(1) lookups but requires entire index in memory
  • Limitation of Hash Indexes: cannot handle more keys than RAM can hold
  • 11 total structures covered include well-known databases (e.g., Bitcask uses Hash Indexes)
  • Article cross-references real-world databases that use each structure

Insights

The key insight is that database performance characteristics are entirely determined by their underlying data structures. Understanding this layer lets engineers predict performance cliffs and choose the right database for a workload rather than relying on benchmarks alone.

Connections

Raw Excerpt

Databases often employ a number of specialised data structures to ensure high performance storage and retrieval of data. While there exists thousands of databases in the wild, there are only a handful of data structures powering them all!