Weka's Bold Bet: Can Flash Storage Replace GPU Memory for AI?

The Motley Fool
The Motley FoolMar 12, 2026

Why It Matters

By turning inexpensive flash storage into GPU‑level memory, Weta cuts AI compute costs and eases DRAM shortages, accelerating the deployment of larger, more capable models across the industry.

Key Takeaways

  • Weta’s software turns flash storage into low‑latency GPU memory.
  • NVMe and 400‑800 Gbps Ethernet enable memory‑like data access.
  • Neural Mesh provides a unified, high‑speed file system across clouds.
  • Solution addresses AI memory scarcity and soaring DRAM prices.
  • Investors include Nvidia, Micron, Samsung, highlighting industry confidence.

Summary

The interview spotlights Weta’s ambitious strategy to use flash‑based storage as a substitute for traditional GPU memory in AI workloads. Leveraging NVMe‑connected NAND and ultra‑fast 400‑800 Gbps Ethernet or InfiniBand, the company’s software dynamically routes data to achieve latency levels indistinguishable from DRAM, effectively extending GPU memory capacity without the prohibitive cost of additional high‑bandwidth memory.

Lauron Zebel explains that the breakthrough lies in a software‑defined layer that monitors client behavior in real time, allocating storage resources where they are needed most. This approach delivers orders‑of‑magnitude cheaper memory‑like performance, crucial for inference and large‑context models where token processing is limited by memory bandwidth rather than GPU compute. The company’s flagship product, Neural Mesh, presents a shared file system that appears as a local drive while scaling across thousands of clients, supporting NFS, SMB, and S3 protocols on‑premise, in public clouds, and in emerging “neoclouds.”

Key moments include Zebel’s claim that the latency gap between flash‑backed storage and actual RAM is now negligible, and that the system can be accessed via standard Ethernet, making it far more accessible than proprietary CXL solutions. Backed by Nvidia, Micron, Samsung Catalyst Fund, and Generation Investment Management, Weta positions itself as the go‑to data platform for the AI era, promising faster inference, larger context windows, and reduced reliance on scarce DRAM.

If widely adopted, Weta’s technology could reshape data‑center economics by lowering the cost per token for AI models, mitigating the current DRAM supply crunch, and enabling smaller players to run large language models without massive hardware investments. The shift from hardware‑centric memory expansion to software‑driven flash acceleration may become a new standard for AI infrastructure.

Original Description

Weka's CEO explains how Weka uses flash to extend GPU memory for low-latency AI inference. We examine the tech, market opportunity, and what investors should verify.
- What Weka does: presents low-latency flash as effective GPU memory and integrates with inference engines.
- Product stack: Neural Mesh (shared file system) and Axon (pooled GPU-server storage) explained.
- Market forces: shift from training to inference and DRAM/HBM scarcity driving demand for lower TCO.
- Partnerships and positioning: integrations with Nvidia, DPUs, cloud and OEM partners and enterprise features.
- Investor takeaways: validate performance claims, watch adoption signals and risks from hardware competitors.
------------------------------------------------------------------------
This video is brought to you by The Motley Fool.
Visit https://fool.com/Invest to get access to this special offer. The Motley Fool Stock Advisor returns are 959% as of 3/6/2026 and measured against the S&P 500 returns of 193% as of 3/6/2026. Past performance is not an indicator of future results. All investing involves a risk of loss. Individual investment results may vary, not all Motley Fool Stock Advisor picks have performed as well.
------------------------------------------------------------------------

Comments

Want to join the conversation?

Loading comments...