本文由 AI 分析生成
建立時間: 2026-03-28
Summary
EN: Instagram’s engineering team describes how they designed a distributed ID generation system to support database sharding across many PostgreSQL instances. The solution encodes 41 bits of millisecond timestamp, 13 bits of logical shard ID, and 10 bits of auto-increment sequence into a 64-bit integer, implemented entirely in PostgreSQL’s PL/PGSQL without any external service. This gives time-sortable, globally unique IDs that are compact and reveal which shard owns the data.
ZH: Instagram 工程團隊描述如何設計分散式 ID 生成系統以支援多個 PostgreSQL 實例的資料分片。解決方案將 41 位元毫秒時間戳、13 位元邏輯分片 ID 與 10 位元自增序列打包成 64 位元整數,完全以 PostgreSQL 的 PL/PGSQL 實作,無需外部服務。生成的 ID 可按時間排序、全域唯一、緊湊,且編碼了分片歸屬資訊。
Key Points
- 64-bit ID structure: 41 bits timestamp (ms) + 13 bits logical shard ID + 10 bits auto-increment (mod 1024)
- 41-bit timestamp: ~69 years of milliseconds from a custom epoch (2011-01-01)
- 13-bit shard ID: supports 8,192 logical shards; shards mapped to physical DBs in application code
- 10-bit sequence: 1,024 IDs per shard per millisecond maximum throughput
- Implementation:
next_id()PL/PGSQL function set as column DEFAULT — no application code changes needed - Logical vs physical shards: start with few physical DBs, move logical shards as you scale — no re-bucketing required
Insights
- The logical/physical shard separation is the key architectural insight: it decouples the ID space from physical infrastructure, allowing future scaling without data migration
- Implementing in PL/PGSQL is elegant — the database generates IDs atomically with inserts, removing any distributed coordination concern
- The 10-bit sequence mod 1024 means at extreme scales (>1024 inserts/shard/ms) IDs are no longer unique — a design constraint that was acceptable for Instagram’s throughput
Connections
- Directly relevant to systems design interviews: ID generation is a classic question covered in Alex Xu’s book
- Related to Twitter’s Snowflake (mentioned in the article itself): Snowflake uses similar bit-packing but requires a distributed coordination service
- The Redis article in this vault: Redis is sometimes used for distributed ID counters — Instagram’s approach is simpler by using Postgres sequences
Raw Excerpt
“Each ID consists of 41 bits for time in milliseconds, 13 bits for the logical shard ID, and 10 bits for an auto-incrementing sequence modulus 1024. The result is a 64-bit integer that is time-sortable, shard-identifiable, and generated entirely within PostgreSQL — no additional moving parts required.”