The Neocloud Boom: State of AI Compute 2026 | Stephen Balaban

Data Driven NYC
Data Driven NYCJun 18, 2026

Why It Matters

Neo‑clouds will dictate the pace of AI innovation and profitability, making them critical assets for investors, tech firms, and policymakers navigating the next wave of compute‑driven growth.

Key Takeaways

  • AI compute demand outpaces supply, keeping GPU prices high.
  • Neo‑clouds remain non‑commodity, requiring vertical integration and financing.
  • Lambda’s orchestration software enables clusters from 16 to 4,000 GPUs.
  • Land, power entitlement, and MEP infrastructure are primary bottlenecks.
  • Scaling laws expand AI market, ensuring sustained compute demand.

Summary

The podcast with Lambda co‑founder and CTO Stephen Balaban examines the 2026 state of AI compute, debunking the notion that GPU power will become a commodity. Balaban argues that neo‑clouds—specialized AI‑focused data centers—are highly integrated operations that span land acquisition, construction, high‑performance computing design, software orchestration, and financing, making them a distinct business from traditional cloud services.

Key insights include persistent under‑building despite soaring demand for large‑language models, a nuanced view of GPU rental pricing that shows both on‑demand and long‑term rates rising, and the importance of financing innovations to fund gigawatt‑scale factories. Lambda’s proprietary one‑click orchestration platform can spin up clusters ranging from 16 to 4,000 GPUs via a web interface, a capability most rivals lack. The primary bottlenecks are land entitlement, megawatt power commitments, and mechanical‑electrical‑plumbing (MEP) infrastructure, not the GPUs themselves.

Balaban emphasizes that “we have an amazing system that can take in money and output software,” highlighting the relentless scaling laws that keep expanding the addressable AI market—from customer‑support bots to software‑engineering augmentation. He also addresses community concerns about data‑center water use, noting that modern deployments use closed‑loop liquid cooling with dry coolers, delivering negligible evaporation and even adding grid‑strengthening benefits.

The implications are clear: investors and enterprises must treat neo‑clouds as strategic, capital‑intensive assets rather than commoditized services. Multiple large players can coexist, mirroring the oligopolistic structure of traditional cloud markets, but success will hinge on superior stack integration, rapid construction, and proactive community engagement.

Original Description

Many people said GPU compute would become a commodity. The opposite happened — and a new category of "neoclouds" is now racing to build the physical backbone of the AI boom. Stephen Balaban, co-founder and CTO of Lambda, explains why the conventional wisdom was exactly wrong, why we're still massively underbuilding compute, and what it actually takes to stand up a gigawatt-scale AI factory: land, power, cooling, networking, and a financing stack most people have never heard of. We go deep on the physics of how energy becomes tokens, NVIDIA's real moat, why a 2023 GPU can lease for more today than the day it shipped, and Stephen's provocative vision of "neural software." Plus the wild Lambda origin story — from a facial recognition startup to a camera in a baseball cap to a near-billion-dollar cloud business. This is the state of AI compute in 2026, from inside one of the companies building it.
Stephen Balaban
Lambda
Matt Turck (Managing Director)
FirstMark
Listen on:
00:00 — Cold open
01:21 — Why GPU compute was never a commodity
02:45 — The H100 price index and what it gets wrong
04:02 — The real moat: technology or financing?
05:57 — Winner-take-all, or room for many neoclouds?
06:48 — Are we overbuilding or underbuilding AI compute?
09:26 — What if AI gets 10x more compute-efficient?
10:44 — The real bottleneck: land, power, and shell
11:38 — The backlash against data centers — and the misinformation
15:00 — Opening the hood: from photons to tokens
17:11 — Extracting more value from the same chip
19:26 — Frontier inference and distributed training, explained
23:26 — What actually drives compute cost
25:21 — Lambda's chip stack and the NVIDIA relationship
26:17 — A multi-silicon world? CUDA, CUDNN, and NVIDIA's real moat
28:59 — Networking, storage, and the one-click cluster
34:46 — Renting vs. owning, and full vertical integration
36:24 — How global is Lambda? Does location still matter?
38:44 — The financing stack: off-take agreements, SPVs, and credit
41:16 — Why a 2023 GPU leases for more today
42:36 — A futures market for compute?
43:54 — Origin story: facial recognition, Perceptio, and Apple
47:03 — The Lambda hat and Dream Scope
48:59 — The $60K bet that became a cloud business
52:00 — Holding the team together through the hard times
54:30 — Bringing on a new CEO; Stephen as CTO
57:33 — Matching xAI on high-velocity deployment
59:29 — "AI won't write software — it will become the software"
01:01:30 — Neural software vs. vibe coding
01:04:25 — Do agents change the compute layer?
01:06:14 — Self-assembling software inside Lambda
01:08:18 — Gigawatt-scale AI factories
01:08:57 — One person, one GPU
01:12:04 — Hot takes: overrated and underrated in AI

Comments

Want to join the conversation?

Loading comments...