GPU Hoarding Is Over. The $401B Reality Check

VentureBeat (GamesBeat)
VentureBeat (GamesBeat)May 13, 2026

Why It Matters

As AI moves from experimental to production, unchecked GPU spend can erode margins; mastering efficiency and private‑stack control is now a strategic imperative for sustainable growth.

Key Takeaways

  • Enterprises shifting from GPU hoarding to efficiency-driven utilization
  • Cost-per-inference concerns rose to 41% of AI priorities
  • LinkedIn’s private stack enables model pruning and kernel optimization
  • 72% of firms lack control, prompting move to private AI clouds
  • Instrumentation and ROI analysis now essential for sustainable AI spend

Summary

The podcast “Beyond the Pilot” examines how enterprise AI is moving out of the panic‑driven GPU hoarding phase and into a disciplined, cost‑focused era. Companies that once over‑provisioned GPUs as insurance are now confronting under‑utilization and tightening budgets.

VentureBeat’s Q1 data shows GPU availability concerns fell from 20% to 15.4%, while worries about cost per inference jumped from 34% to 41%. Seventy‑two percent of respondents admit insufficient control over AI workloads, and the share planning full‑stack private AI infrastructure rose from 11% to 17%. Inference workloads now dominate 60‑80% of AI compute on hyperscalers.

LinkedIn’s new CTO, Iran Ber, illustrated a “cookbook” approach: owning the entire stack, applying model pruning, embedding compression, custom GPU kernels, and tailored networking to squeeze throughput. He emphasized that instrumentation must quantify per‑feature compute cost at scale, linking it directly to revenue impact.

The shift forces enterprises to prioritize observability, ROI modeling, and private‑cloud or sovereign‑cloud solutions to retain cost control and data‑safety. Organizations that fail to embed these disciplines risk stranded hardware and competitive disadvantage as AI becomes a baseline service rather than a speculative experiment.

Original Description

Enterprise GPU hoarding is over. LinkedIn CTO Erran Berger and VentureBeat analyst Rob Strechay break down what comes next — and the infrastructure math most enterprises are only now being forced to confront.
VentureBeat's Q1 research shows GPU availability anxiety dropped from 20.8% to 15.4% among enterprise teams, while cost-per-inference and TCO concerns jumped from 34% to 41% — a number that's still climbing. The hoarding phase is giving way to an audit phase, and the companies that didn't build the instrumentation to understand their workloads are now paying for it.
Erran Berger explains how LinkedIn runs one of the few remaining at-scale applied ML shops outside the hyperscalers — owning the full stack from bare metal GPU clusters to member-facing products. That means LinkedIn engineers can optimize custom CUDA kernels, compress embeddings, prune models for throughput, and adapt networking and storage per workload — trade-offs that are simply unavailable on public cloud instance menus. The result: a rigorous ROI framework that evaluates not just current traffic costs, but the traffic shape agents will drive in 2–3 years.
On the market side, 72% of enterprises admit they lack sufficient control over their AI infrastructure. Open-source inference tools like vLLM and LLMD are seeing rapid adoption, while 17% of organizations have moved to full-stack ownership. Hyperscalers report 60–80% of workloads have already shifted from training to inference — and most enterprise teams are still figuring out how to staff and instrument for that reality.
🎙️ GUEST: Erran Berger | CTO, LinkedIn
🎙️ ANALYST: Rob Strechay | VentureBeat
🎙️ HOST: Matt Marshall | CEO, VentureBeat

00:00 Intro: The GPU Hoarding Hangover
00:10 Guest Introductions
02:00 VentureBeat Q1 Data: GPU Panic Fades, TCO Concerns Rise
03:00 LinkedIn's Early Shift to Inference ROI Discipline
04:00 Budget Moving Into Inference Optimization and Control
07:00 LinkedIn's Full-Stack Advantage: Kernels, Pruning, Embedding Compression
08:00 Private AI and Sovereign Stacks: What the Q1 Data Shows
09:00 Open Source Inference Tooling: vLLM, LLMD, RDMA
10:00 Data Sovereignty at LinkedIn Scale: Member Data and Board-Level ROI Framing
12:00 Why Instrumentation Beats GPU Hoarding
13:00 Planning for Ambient Agent Traffic — Not Just Today's Workloads
14:00 Closing Advice for the Enterprise CTO Staring at 5% GPU Utilization

Subscribe to VentureBeat: https://www.youtube.com/@VentureBeat
#EnterpriseAI #AIInfrastructure #MLOps #InferenceOptimization #GenerativeAI

Comments

Want to join the conversation?

Loading comments...