本文由 AI 分析生成
建立時間: 2018-02-13
Summary
EN: Brandon Dimcheff describes how Olark solved zero-downtime deployments for a stateful XMPP chat service using Kubernetes. Standard rolling deploys break long-lived connections, so they adopted “rainbow deploys”: each deployment gets a unique name based on its git SHA (e.g., chat-olark-com-$SHA), the Kubernetes Service selector is switched to the new deployment, and the old deployment is left running until all connections drain naturally before being deleted.
ZH: Olark 工程師描述如何使用「彩虹部署」解決有狀態 XMPP 聊天服務的零停機部署問題:每次部署以 git SHA 命名(chat-olark-com-$SHA),切換 Service selector 至新部署,保留舊部署直到連線自然排空後再刪除。
Key Points
- Standard Kubernetes rolling deploys kill pods mid-connection — problematic for stateful, long-lived connections (XMPP/WebSocket)
- Rainbow deploy pattern:
Deploymentname =<service>-<git-sha>, e.g.,chat-olark-com-abc1234 - Steps: deploy new Deployment → switch Service selector to new pods → wait for old pods to drain → delete old Deployment
- Drain period can be monitored via connection count metrics
- Trade-off: temporarily runs 2× the pod count during drain
- Git SHA in deployment name makes rollback trivial — previous deployment still exists and can have selector switched back
Insights
- The pattern separates “running code” from “receiving traffic” — a fundamental principle that makes many deployment strategies possible
- Rainbow deploys are essentially blue/green deploys with infinite colors — each SHA is a new “color,” and you can have multiple versions live simultaneously
- The approach works because Kubernetes Services are just label selectors — changing where traffic goes is an O(1) operation
Connections
- Related to Instagram sharding: both use immutable identifiers (git SHA / shard ID) to make operations predictable and reversible
- Connects to the systems design interview resources — rainbow/blue-green deploys are a classic system design topic
- The “drain then delete” pattern appears in service mesh (graceful termination) and queue-based architectures
Raw Excerpt
“We name our deployments
chat-olark-com-$GIT_SHA. When we deploy, we create the new deployment, wait for pods to be ready, then switch the Service’s label selector to point at the new deployment. The old deployment keeps running until all connections drain — only then do we delete it. No connections are ever forcibly terminated.”