本文由 AI 分析生成
建立時間: 2020-04-27
Summary
EN: A Traditional Chinese article explaining the purpose and design choices around SQL primary keys. Covers why primary keys exist (uniqueness enforcement, row identification, foreign key targets), the difference between natural keys and surrogate/pseudokeys, compound primary keys, and the classic race condition bug when using MAX(id) instead of LAST_INSERT_ID() to retrieve the newly inserted row’s ID.
ZH: 本文以繁體中文解釋 SQL 主鍵的目的與設計選擇:主鍵的作用(唯一性、行識別、外鍵目標)、自然鍵與代理鍵/偽鍵的差異、複合主鍵,以及使用 MAX(id) 替代 LAST_INSERT_ID() 取得新插入行 ID 時的經典競態條件問題。
Key Points
- Primary key purposes: enforce row uniqueness, enable efficient row lookup, serve as foreign key target
- Natural key: using a real-world identifier (e.g., Taiwan ID number) — risk of changes or non-uniqueness over time
- Surrogate/pseudokey: auto-incremented integer with no business meaning — preferred for stability
- Compound primary keys: combine multiple columns; used in junction tables (many-to-many relationships)
- Race condition:
MAX(id)returns the global maximum — another concurrent insert can steal it;LAST_INSERT_ID()returns the current connection’s last inserted ID — thread-safe - Primary key doesn’t have to be named
id— naming is convention, not a SQL requirement
Insights
- The
MAX(id)vsLAST_INSERT_ID()race condition is a real production bug many junior developers introduce — the article’s explanation is a valuable code review reference - Natural keys feel intuitive but violate the principle that primary keys should never change — government IDs get corrected, phone numbers change, emails change
- Compound primary keys in junction tables are the “correct” normalization pattern but many ORMs encourage adding a surrogate
idcolumn to junction tables for easier querying
Connections
- Directly related to Instagram’s ID sharding article: both address the challenge of unique ID generation, at different scales
- Connects to the SQL vs Redis article: both discuss data storage fundamentals
- The
LAST_INSERT_ID()concurrency issue connects to Rainbow Deploys — both are about isolation under concurrent operations
Raw Excerpt
“不要用
MAX(id)取得剛插入的 ID,這在高並發情境下會產生競態條件:你取到的可能是別人剛插入的 ID。應使用LAST_INSERT_ID(),它只返回當前連線最後一次插入的 ID,不受其他並發連線影響。”