本文由 AI 分析生成
建立時間: 2026-03-28 來源: https://geocld.github.io/2022/06/10/cas/
Summary
Li Jiahao explains Content-Addressed Storage (CAS) — a storage model where content is retrieved by its hash rather than by location. Unlike location-based addressing (which breaks when files move or are deleted), CAS guarantees content integrity, enables deduplication, and provides O(1) lookup. The article covers the core concept using hash tables as analogy and explores real-world uses including Git, npm, IPFS, and Docker layers.
本文解釋內容尋址存儲(CAS)——以內容的哈希值而非位置檢索內容的存儲模型。相比位置尋址(文件移動或刪除後失效),CAS 保證內容完整性、實現去重,並提供 O(1) 查找效率,在 Git、npm、IPFS 和 Docker 層中廣泛應用。
Key Points
- CAS stores
hash(content) → content; content is immutable and identity is derived from content, not location - O(1) lookup efficiency (hash table); content integrity guaranteed by hash algorithm
- Traditional location-based storage breaks when files move; CAS is immune to location changes
- Git uses CAS: each commit, tree, and blob is stored by its SHA hash
- npm package cache, Docker layer caching, IPFS all use CAS principles
- Deduplication is automatic: identical content produces identical hash = stored once
Insights
CAS elegantly solves the problem of mutable location references by making content the identity itself. The insight that “if content changes, the address changes” turns immutability from a constraint into a feature — it makes content versioning and deduplication trivially correct. This is why Git’s history is tamper-evident: any change to any historical object would change all downstream hashes.
Connections
Raw Excerpt
内容寻址存储的本质,value可以理解是文件的具体内容,通过键值对的方式进行内容寻址,在算法上只需O(1)的时间复杂度就可以完成,效率很高!