
Improving Storage Efficiency in Magic Pocket, Our Immutable Blob Store
Companies Mentioned
Why It Matters
Lowering storage overhead at exabyte scale directly cuts raw capacity costs and preserves performance, showcasing the need for adaptive compaction in immutable storage systems.
Key Takeaways
- •Immutable blob store suffers fragmentation without active compaction.
- •L2 uses DP to pack sparse volumes near full.
- •L3 streams under-filled volumes via Live Coder for rapid reclamation.
- •Dynamic eligibility thresholds automate compaction tuning.
- •Overhead dropped 30‑50% within days after rollout.
Pulse Analysis
Immutable storage systems like Dropbox’s Magic Pocket must balance durability with efficient space use. Because blobs are never overwritten, deletes leave behind unused fragments that, if not reclaimed, cause volumes to become partially filled and dramatically increase the raw capacity required to store active data. At the exabyte level, even a few percent of extra overhead translates into millions of dollars of hardware and power costs. Compaction—garbage collection followed by physical consolidation—has therefore become a critical operational function, especially when redundancy schemes such as erasure coding add their own storage overhead.
When a new Live Coder service unintentionally generated volumes that were less than five percent full, the existing steady‑state compaction (L1) could not keep pace. Dropbox responded with a layered strategy: L2 treats the under‑filled volume set as a bounded packing problem, using dynamic programming to combine multiple sparse sources into a near‑full destination. L3 repurposes the Live Coder pipeline as a continuous re‑encoding stream, draining the most empty volumes with minimal data movement. Both approaches are gated by a dynamic eligibility‑threshold loop that automatically adjusts compaction aggressiveness based on real‑time overhead metrics, eliminating the need for manual tuning.
The result is a rapid reclamation of space—30‑50% lower overhead within a week—and a more resilient storage fleet that can adapt to workload shifts without jeopardizing metadata or network resources. For other cloud providers and enterprises operating immutable blob stores, the lesson is clear: a single heuristic compaction model is insufficient at scale. Implementing multi‑tiered, data‑aware compaction pipelines and automated control loops can deliver substantial cost savings while preserving the durability guarantees that modern applications demand.
Comments
Want to join the conversation?
Loading comments...