
Flash scarcity drives up AI infrastructure costs while delivering record earnings to memory manufacturers, reshaping the economics of data‑center scaling.
The memory market’s roller‑coaster ride has entered a new phase. After the pandemic‑fuelled surge and a sharp 2022 price collapse, DRAM and flash inventories sat idle. The resurgence of demand now comes from generative AI workloads that require not only high‑bandwidth memory (HBM) but also terabytes of persistent flash per GPU. Fab capacity cannot be expanded overnight, so manufacturers are forced to allocate existing silicon to the most lucrative AI‑driven orders, pushing prices to multi‑year highs.
Nvidia’s reference architecture illustrates why flash has become mission‑critical. Its G1 tier (HBM) and G2 tier (server DRAM) handle raw compute, while the G3 tier stores intermediate checkpoint data essential for long‑running training jobs. The forthcoming G3.5 tier adds inference‑context memory using BlueField‑4 DPUs, further inflating flash usage. With Nvidia recommending roughly 15 TB of G3 storage per GPU and 30 TB of external storage, a single 1‑GW AI installation can consume up to 25 exabytes of flash, a figure that scales to hundreds of exabytes as GPU shipments grow.
Looking ahead, flash demand is set to eclipse supply well into 2026, driving continued price appreciation and solidifying memory makers’ profit margins. Suppliers may respond by expanding fab lines, adopting advanced packaging to improve yields, or shifting production toward higher‑value SSDs. Meanwhile, hyperscalers will need to balance storage costs against performance gains, possibly revisiting tiered‑storage strategies or investing in alternative non‑volatile memory technologies. The interplay of supply constraints, soaring AI workloads, and evolving architecture will define the next chapter of the flash market.
Comments
Want to join the conversation?
Loading comments...