Flash Getting Stacked High-Bandwidth Version

Flash Getting Stacked High-Bandwidth Version

Semiconductor Engineering
Semiconductor EngineeringMay 14, 2026

Why It Matters

By placing massive, static AI model weights directly next to the processor, HBF can dramatically reduce data‑movement costs and reshape accelerator design for data‑center inference workloads.

Key Takeaways

  • Sandisk's HBF offers up to 3 TB per stack, 8‑16× HBM capacity
  • Read bandwidth targets 1.6 TB/s, matching HBM4 power and footprint
  • Samples expected H2 2026; inference accelerators slated early 2027
  • Designed for AI inference weights, not training due to flash write limits
  • Standardization via OCP with SK Hynix, aiming for industry adoption

Pulse Analysis

High‑bandwidth flash (HBF) represents a strategic pivot in the memory hierarchy for artificial‑intelligence inference. Traditional designs rely on a cascade of storage tiers—SSD, DRAM, and HBM—to shuttle billions of model parameters, incurring latency penalties each time data traverses the stack. By co‑packaging a 16‑die NAND stack with the GPU, HBF collapses this chain, delivering read speeds comparable to HBM while offering terabyte‑scale capacity. This proximity not only trims the weight‑fetch latency but also reduces the power budget associated with moving data across board‑level interconnects, a critical factor as data‑center operators chase efficiency gains.

The commercial implications are significant. AI inference servers, which dominate cloud workloads, often run the same static model thousands of times, making read‑optimized memory a perfect fit. HBF’s 1.6 TB/s read bandwidth and 512 GB per stack enable entire large‑language‑model weight sets to reside on‑chip, eliminating the need for costly DRAM caching layers. While flash’s slower write cycles preclude its use for training, the trade‑off is acceptable for inference‑only deployments, where weight updates are infrequent. Moreover, the NAND foundation ensures a mature manufacturing ecosystem, keeping unit costs competitive against emerging NVMs like MRAM or RRAM.

Standardization through the Open Compute Project (OCP) accelerates HBF’s path to market by fostering a collaborative, rapid‑iteration environment that aligns with AI’s fast‑moving requirements. The partnership with SK Hynix adds credibility and supply‑chain depth, positioning HBF as a viable alternative to HBM for next‑generation accelerators. As AI models continue to scale, memory bandwidth and capacity will become the primary bottlenecks; HBF offers a pragmatic solution that could redefine how data‑center architects balance performance, power, and cost.

Flash Getting Stacked High-Bandwidth Version

Comments

Want to join the conversation?

Loading comments...