AI Videos

All News Deals Social Blogs Videos Podcasts Digests

AI Hardware Semiconductors CTO Pulse Entrepreneurship

The Neocloud Boom: State of AI Compute 2026 | Stephen Balaban

•June 18, 2026

Data Driven NYC

Data Driven NYC•Jun 18, 2026

Why It Matters

Neo‑clouds will dictate the pace of AI innovation and profitability, making them critical assets for investors, tech firms, and policymakers navigating the next wave of compute‑driven growth.

Key Takeaways

•AI compute demand outpaces supply, keeping GPU prices high.
•Neo‑clouds remain non‑commodity, requiring vertical integration and financing.
•Lambda’s orchestration software enables clusters from 16 to 4,000 GPUs.
•Land, power entitlement, and MEP infrastructure are primary bottlenecks.
•Scaling laws expand AI market, ensuring sustained compute demand.

Summary

The podcast with Lambda co‑founder and CTO Stephen Balaban examines the 2026 state of AI compute, debunking the notion that GPU power will become a commodity. Balaban argues that neo‑clouds—specialized AI‑focused data centers—are highly integrated operations that span land acquisition, construction, high‑performance computing design, software orchestration, and financing, making them a distinct business from traditional cloud services.

Key insights include persistent under‑building despite soaring demand for large‑language models, a nuanced view of GPU rental pricing that shows both on‑demand and long‑term rates rising, and the importance of financing innovations to fund gigawatt‑scale factories. Lambda’s proprietary one‑click orchestration platform can spin up clusters ranging from 16 to 4,000 GPUs via a web interface, a capability most rivals lack. The primary bottlenecks are land entitlement, megawatt power commitments, and mechanical‑electrical‑plumbing (MEP) infrastructure, not the GPUs themselves.

Balaban emphasizes that “we have an amazing system that can take in money and output software,” highlighting the relentless scaling laws that keep expanding the addressable AI market—from customer‑support bots to software‑engineering augmentation. He also addresses community concerns about data‑center water use, noting that modern deployments use closed‑loop liquid cooling with dry coolers, delivering negligible evaporation and even adding grid‑strengthening benefits.

The implications are clear: investors and enterprises must treat neo‑clouds as strategic, capital‑intensive assets rather than commoditized services. Multiple large players can coexist, mirroring the oligopolistic structure of traditional cloud markets, but success will hinge on superior stack integration, rapid construction, and proactive community engagement.

Original Description

Many people said GPU compute would become a commodity. The opposite happened — and a new category of "neoclouds" is now racing to build the physical backbone of the AI boom. Stephen Balaban, co-founder and CTO of Lambda, explains why the conventional wisdom was exactly wrong, why we're still massively underbuilding compute, and what it actually takes to stand up a gigawatt-scale AI factory: land, power, cooling, networking, and a financing stack most people have never heard of. We go deep on the physics of how energy becomes tokens, NVIDIA's real moat, why a 2023 GPU can lease for more today than the day it shipped, and Stephen's provocative vision of "neural software." Plus the wild Lambda origin story — from a facial recognition startup to a camera in a baseball cap to a near-billion-dollar cloud business. This is the state of AI compute in 2026, from inside one of the companies building it.

Stephen Balaban

LinkedIn - https://www.linkedin.com/in/sbalaban

X/Twitter - https://x.com/stephenbalaban

Lambda

Website - https://lambda.ai

X/Twitter - https://x.com/LambdaAPI

Matt Turck (Managing Director)

Blog - https://mattturck.com

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://x.com/mattturck

FirstMark

Website - https://firstmark.com

X/Twitter - https://x.com/FirstMarkCap

Listen on:

Spotify - https://open.spotify.com/show/7yLATDSaFvgJG80ACcRJtq

Apple - https://podcasts.apple.com/us/podcast/the-mad-podcast-with-matt-turck/id1686238724

00:00 — Cold open

01:21 — Why GPU compute was never a commodity

02:45 — The H100 price index and what it gets wrong

04:02 — The real moat: technology or financing?

05:57 — Winner-take-all, or room for many neoclouds?

06:48 — Are we overbuilding or underbuilding AI compute?

09:26 — What if AI gets 10x more compute-efficient?

10:44 — The real bottleneck: land, power, and shell

11:38 — The backlash against data centers — and the misinformation

15:00 — Opening the hood: from photons to tokens

17:11 — Extracting more value from the same chip

19:26 — Frontier inference and distributed training, explained

23:26 — What actually drives compute cost

25:21 — Lambda's chip stack and the NVIDIA relationship

26:17 — A multi-silicon world? CUDA, CUDNN, and NVIDIA's real moat

28:59 — Networking, storage, and the one-click cluster

34:46 — Renting vs. owning, and full vertical integration

36:24 — How global is Lambda? Does location still matter?

38:44 — The financing stack: off-take agreements, SPVs, and credit

41:16 — Why a 2023 GPU leases for more today

42:36 — A futures market for compute?

43:54 — Origin story: facial recognition, Perceptio, and Apple

47:03 — The Lambda hat and Dream Scope

48:59 — The $60K bet that became a cloud business

52:00 — Holding the team together through the hard times

54:30 — Bringing on a new CEO; Stephen as CTO

57:33 — Matching xAI on high-velocity deployment

59:29 — "AI won't write software — it will become the software"

01:01:30 — Neural software vs. vibe coding

01:04:25 — Do agents change the compute layer?

01:06:14 — Self-assembling software inside Lambda

01:08:18 — Gigawatt-scale AI factories

01:08:57 — One person, one GPU

01:12:04 — Hot takes: overrated and underrated in AI

Comments

Want to join the conversation?

Loading comments...