Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores

本文由 AI 分析生成

建立時間： 2026-03-28 來源： https://www.vldb.org/pvldb/vol13/p3411-armbrust.pdf

Summary

Armbrust et al. (Databricks/Stanford/Berkeley, VLDB 2020) present Delta Lake, an open-source ACID table storage layer over cloud object stores. Uses a write-ahead log stored in Parquet format to provide transactions, time travel, and fast metadata operations over S3/Azure Blob — without running a dedicated metadata server.

Armbrust 等人（Databricks/Stanford/Berkeley，VLDB 2020）介紹了 Delta Lake，一個開源的 ACID 表存儲層，建立在雲對象存儲之上。使用 Parquet 格式存儲的預寫日誌，無需運行專用元數據服務器，即可在 S3/Azure Blob 上提供事務、時間旅行和快速元數據操作。

Prerequisites

Cloud object stores (S3, Azure Blob) and their key-value store consistency model
Apache Parquet columnar file format
ACID transactions and write-ahead logs (WAL)
Apache Spark data processing

Core Idea

Cloud object stores lack cross-key atomicity, making it impossible to update multiple files consistently. Delta Lake solves this by maintaining a transaction log (stored in the object store itself) that records which Parquet files belong to a table at each point in time. All mutations go through optimistic concurrency control against the log — readers always see a consistent snapshot, writers resolve conflicts via the log. Because all metadata is in the object store (no separate server), compute and storage scale independently.

Key features enabled by the log:

ACID transactions: multi-object updates are atomic
Time travel: query any historical table snapshot via the log
UPSERT/DELETE/MERGE: rewrite affected Parquet files transactionally
Streaming I/O: low-latency small writes coalesced later by compaction
Fast metadata: min/max statistics in the log enable partition pruning without touching every file footer

Results

Deployed at thousands of Databricks customers processing exabytes/day
Reduced cloud storage–related support escalations from ~50% to nearly zero
Query speedups up to 100x for high-dimensional datasets (network security, bioinformatics) via data layout optimization and fast statistics access
Supports Apache Spark, Hive, Presto, Redshift, Snowflake connectors

Limitations

Author-stated:

Optimistic concurrency can cause write conflicts under high-write contention (though mitigations exist)
Small-object performance depends on periodic compaction

Unstated:

The transaction log can grow large for high-churn tables; log compaction is needed
No support for true cross-table transactions
Performance heavily dependent on Spark ecosystem; non-Spark connectors are less mature

Reproducibility

Code: Open source at github.com/delta-io/delta
Data: Production workloads at Databricks; no public benchmark dataset
Compute: Not applicable (service-level evaluation)

Insights

The core insight — store the transaction log inside the same object store as the data — elegantly sidesteps the need for a metadata service while enabling ACID semantics. This is architecturally clever: it leverages the one consistency primitive that object stores do provide (atomic single-object PUT) to build multi-object atomicity. The “lakehouse” framing (combining data lake cost with data warehouse features) became influential; Delta Lake pioneered the pattern that Apache Iceberg and Apache Hudi followed. The ~50% → ~0% support ticket reduction is a striking operational result that validates the practical value of transactional storage.

Connections

Raw Excerpt

The core idea of Delta Lake is simple: we maintain information about which objects are part of a Delta table in an ACID manner, using a write-ahead log that is itself stored in the cloud object store. This means that no servers need to be running to maintain state for a Delta table; users only need to launch servers when running queries.

bot_vault

Explorer

Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores

Summary

Prerequisites

Core Idea

Results

Limitations

Reproducibility

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks