Penguin Solutions Introduces Industry’s First Production-Ready CXL-Based KV Cache Server

Penguin Solutions Introduces Industry’s First Production-Ready CXL-Based KV Cache Server

HPCwire
HPCwireMar 17, 2026

Key Takeaways

  • First production‑ready CXL KV cache server
  • Provides up to 11 TB memory capacity
  • Cuts inference latency and improves token throughput
  • Offers 10× faster access than NVMe storage
  • Compatible with NVIDIA Dynamo for KV offloading

Pulse Analysis

CXL’s emergence as a high‑speed, cache‑coherent interconnect is reshaping data‑center architecture, allowing memory to be disaggregated from compute nodes without sacrificing bandwidth. Penguin Solutions leverages this capability to create a dedicated KV cache tier that sits between GPU memory and traditional DRAM, effectively extending the memory hierarchy. This approach mitigates the long‑standing "memory wall" that has limited inference performance, especially as large language models grow in parameter count and context length.

For enterprises deploying real‑time AI services—such as financial news parsing, retrieval‑augmented generation over massive regulatory filings, or conversational agents—the latency of each token matters. By offloading KV data to an 11 TB CXL pool, the MemoryAI server reduces the number of costly GPU recompute cycles and shortens time‑to‑first‑token. Early benchmarks indicate up to 30 % lower GPU idle time and a measurable increase in token‑per‑second rates, translating directly into higher throughput and lower operational costs for inference clusters.

The introduction of a production‑ready CXL KV cache also signals a shift in competitive dynamics. Vendors that previously relied on NVMe or purely on‑board HBM must now consider memory‑disaggregation to stay relevant. Penguin’s alignment with NVIDIA Dynamo further integrates the solution into existing AI software stacks, easing adoption. As more organizations seek to run larger models with tighter SLAs, CXL‑based memory expansion is poised to become a standard component of AI‑focused data centers, driving both hardware innovation and new pricing models.

Penguin Solutions Introduces Industry’s First Production-Ready CXL-Based KV Cache Server

Comments

Want to join the conversation?