‘In AI Models, the Real Bottleneck Isn’t Computing Power — It’s Memory’: Phison CEO on 244TB SSDs, PLC NAND, Why High-Bandwidth Flash Isn’t a Good Idea, and Why CSP Profit Goes Hand in Hand with Storage Capacity

•January 14, 2026

TechRadar•Jan 14, 2026

Why It Matters

By shifting focus to scalable, replaceable memory, enterprises can lower AI inference latency and improve CSP profitability without over‑investing in expensive GPU silicon.

Key Takeaways

•Memory, not compute, limits AI inference
•Phison's aiDAPTIV+ uses SSDs as DRAM extension
•244TB SSD relies on 32-layer NAND stacking
•CSP profits tied to storage capacity, not GPU spend
•High‑bandwidth flash risky due to NAND endurance

Pulse Analysis

The AI hardware conversation has long centered on ever‑more powerful GPUs, yet real‑world deployments reveal that insufficient memory stalls model execution, especially during the critical time‑to‑first‑token phase. When a system lacks enough DRAM, it must repeatedly recompute KV‑cache data, inflating latency and degrading user experience. By treating SSD capacity as a supplemental memory pool, Phison’s aiDAPTIV+ architecture reduces DRAM pressure, allowing GPUs to maintain compute throughput while storage delivers rapid cache retrieval. This approach mirrors web browsers caching cookies, but at a hardware level, dramatically cutting inference delays.

Phison’s strategy extends beyond software tricks to hardware innovation. The company announced a 244‑terabyte enterprise SSD, achieved by stacking 32 layers of 16‑layer NAND and preparing for future 4‑terabit dies that could halve the stack depth. Such extreme‑capacity drives enable cloud service providers to amass the massive data reservoirs required for inference workloads, directly linking storage volume to revenue. Phison also monitors PLC (five‑bit) NAND, ready to integrate once manufacturers achieve reliable yields, ensuring its SSD roadmap stays ahead of emerging memory technologies.

From a business perspective, the shift toward memory‑centric AI infrastructure reshapes capital allocation. CSPs have poured over $200 billion into GPUs, yet profits derive from the storage‑driven inference pipeline. By decoupling memory from GPU silicon, operators can purchase fewer, compute‑optimized GPUs and rely on modular SSD upgrades, mitigating the risk of costly GPU obsolescence caused by NAND wear‑out. This modularity not only extends hardware lifecycles but also aligns with sustainability goals, positioning memory‑first designs as the pragmatic path for scalable, cost‑effective AI deployment.