Startup Boosts Scale-Up to 1000+ GPUs in a Single Domain

Startup Boosts Scale-Up to 1000+ GPUs in a Single Domain

EE Times – Designlines/AI & ML
EE Times – Designlines/AI & MLMay 27, 2026

Companies Mentioned

Why It Matters

The ability to scale inference clusters beyond traditional 100‑GPU limits reduces per‑token cost and latency, giving cloud providers and enterprises a competitive edge in real‑time AI services. It also opens the door to heterogeneous accelerator mixes, expanding the utility of large‑scale GPU farms.

Key Takeaways

  • Delos enables >1,000 GPUs in a single scale‑up domain
  • Nonstop AI uses nine OSFP ports per GPU, 72×200 Gb/s
  • Modular disaggregation reduces latency for always‑on inference workloads
  • Mosaic software reroutes traffic instantly when cables or GPUs fail
  • Early deployments target Q4 2026, aiming at AI inference market

Pulse Analysis

The AI industry is moving from batch‑oriented training to always‑on inference, where nanosecond latency and continuous availability are paramount. Traditional scale‑out racks rely on multiple hops between GPUs, inflating latency and complicating resource orchestration. Companies that can deliver a tightly coupled, low‑latency fabric for inference gain a decisive advantage in applications such as recommendation engines, autonomous systems, and real‑time analytics.

Delos Data’s Nonstop AI tackles this challenge with a hardware‑first approach. Partnering with a Taiwanese OEM, the firm equips each accelerator with nine OSFP connectors, delivering a combined 72 × 200 Gb/s bandwidth per server. This front‑panel scale‑up architecture lets dozens of GPUs communicate directly, bypassing the slower inter‑rack links typical of scale‑out designs. The Mosaic software layer monitors link health and instantly re‑routes traffic if a cable is unplugged or a GPU fails, preserving throughput without manual intervention. By decoupling compute from a monolithic rack, the system also supports heterogeneous mixes of GPUs and other AI accelerators, offering customers the flexibility to tailor hardware to specific inference models.

The market implications are significant. Cloud providers and hyperscale data centers can now consider consolidating thousands of GPUs into a single logical domain, potentially lowering power‑per‑token costs and simplifying network management. Delos’s claim that 10,000‑GPU domains are feasible challenges Nvidia’s NVLink ceiling and could spur competitors to revisit their interconnect strategies. As early customers pilot the technology ahead of the Q4 2026 rollout, the industry will watch closely to see whether this modular, resilient scale‑up model becomes the new standard for high‑throughput AI inference.

Startup Boosts Scale-Up to 1000+ GPUs in a Single Domain

Comments

Want to join the conversation?

Loading comments...