Why the Agentic Era Is Already Hitting Resource Walls

KPMG US
KPMG USMay 29, 2026

Why It Matters

Resource constraints will dictate which enterprises can scale AI agents profitably, turning token‑management into a core competitive capability.

Key Takeaways

  • Agent deployment is outpacing compute, creating immediate resource bottlenecks.
  • Inference costs fall, but usage growth outstrips savings dramatically.
  • Multi‑model evaluation essential to avoid lock‑in and ensure resilience.
  • Token‑management tools needed to prevent runaway consumption and downtime.
  • Early adopters gain advantage; laggards risk costly AI implementation failures.

Summary

The episode examines how the emerging "agentic era"—where autonomous AI agents operate at enterprise scale—is already colliding with hard resource limits. Hosts Nathaniel Whitmore and KPMG’s Steve Chase discuss the rapid shift from experimental agents to production‑grade workloads, and why token efficiency and compute availability have become strategic concerns.

Key insights reveal a paradox: inference costs are falling roughly tenfold each year, yet token consumption is exploding—potentially a hundredfold—so overall compute demand remains pressure‑filled. OpenAI’s recent shutdown of Sora illustrates that even well‑funded model providers must ration compute between training and inference, while enterprises lose the cheap token subsidies that once made unlimited usage feasible.

The conversation highlights concrete examples: KPMG’s pulse report confirming widespread agent adoption, Meta’s internal token‑maxing leaderboard, and the need for multi‑model evaluation frameworks to avoid lock‑in. Both providers and users must build robust monitoring, automated throttling, and model‑selection strategies because a single runaway agent can consume thousands of dollars in tokens, effectively a denial‑of‑service attack.

For businesses, the implication is clear: successful AI integration now hinges on systems thinking, token‑budget governance, and resilient architecture. Leaders who establish rigorous evals and flexible model pipelines will capture productivity gains, while laggards risk spiraling costs and operational disruption as the agentic workload expands.

Original Description

In this trendline episode, You can with AI co-host NLW sits down with KPMG’s Steve Chase to unpack the resource constraints defining the agentic era — from growing competition between training and inference to the reality that agent usage is scaling faster than costs are falling. Together, they explore what separates leading organizations from the rest, including systems thinking around model selection, token management, and building resilience into agentic deployments before it’s too late.
This episode explains:
• Why AI agents are becoming real inside enterprises
•How token usage, inference costs, and compute constraints are emerging as new business limits
•Why long running, token hungry agents change AI economics entirely
•The growing need for systems thinking, resilience, and multi model AI strategies
You’ll also learn:
• How leading organizations think about AI resource management
•Why agentic systems demand new operating models
•What it means to design AI environments that can withstand model failures, cost spikes, and capacity constraints
If you are responsible for enterprise AI strategy, agent deployment, or AI governance, this conversation outlines the challenges that matter now — not the ones from last year.

Comments

Want to join the conversation?

Loading comments...