Summary

Google Cloud Architecture Center’s comprehensive guide to database migration concepts: covering terminology, homogeneous vs. heterogeneous migration, downtime tolerance tradeoffs, CDC vs. differential querying, data transformation patterns, and common pitfalls (order violations, consistency issues, missing/duplicate data).

Google Cloud 架構中心對資料庫遷移概念的全面指南,涵蓋術語定義、同質與異質遷移、停機容忍度取捨、CDC 與差異查詢方法、數據轉換模式以及常見陷阱。

Key Points

  • Homogeneous migration: same DBMS (e.g., MySQL → MySQL); schemas often identical, can use native replication
  • Heterogeneous migration: different DBMS (e.g., Oracle → Spanner); requires schema/data transformation
  • CDC (Change Data Capture) via transaction logs is preferred over differential querying for minimal load impact
  • Downtime spectrum: zero (near-zero) → minimal → significant; each enables simpler migration tooling
  • Migration cardinality: 1:1 (direct), n:1 (consolidation), 1:n (distribution), n:m (redistribution)
  • Key pitfalls: order violations in parallel migration, consistency violations from differential query timestamps, flashback after source DB failover

Insights

The “active-active migration” concept (both source and target accept writes simultaneously) is underappreciated but complex — it requires conflict resolution rules and is rarely the right choice. The insight that migration consistency (complete, duplicate-free, ordered) is equivalent to saying “source and target state must be equivalent at migration end” cleanly captures what correctness means without prescribing implementation. The CDC-based approach through transaction logs (binary log in MySQL) is almost always preferable to polling because it’s already built into the database and imposes minimal additional load.

Connections

Raw Excerpt

CDC is the preferred approach for capturing changes in a database management system. CDC is built into the database itself and has the least load impact on the system.