The NPU in Your Phone Keeps Improving—Why Isn’t that Making AI Better?

•December 4, 2025

Ars Technica AI•Dec 4, 2025

Companies Mentioned

Google

GOOG

MediaTek

2454

OpenAI

Why It Matters

Edge AI promises data‑privacy and offline functionality, but current hardware and model constraints keep most high‑value AI in the cloud, shaping where investment and innovation will flow.

Key Takeaways

•NPUs improve speed but often remain underutilized.
•Cloud models dwarf edge capacity in parameters and context.
•Privacy concerns drive interest in on‑device AI despite limits.
•Developers face fragmentation and model‑size constraints on phones.

Pulse Analysis

The latest wave of neural processing units reflects a decade‑long evolution from digital signal processors to specialized AI accelerators. Qualcomm’s Hexagon, MediaTek’s ninth‑generation NPU, and Google’s Tensor all claim 30‑40% speed gains, but real‑world usage shows many phones idle their AI engines except for lightweight tasks. This underutilization stems from a mismatch between the hardware’s parallel‑compute strengths and the limited, quantized models that can actually fit within a smartphone’s memory and power envelope.

Cloud‑based AI dwarfs edge capabilities in both scale and flexibility. Large language models such as Gemini or ChatGPT run with billions of parameters and token windows measured in millions, far beyond the few‑billion‑parameter, 3‑4‑GB memory sweet spot that mobile NPUs can handle. To make models run on‑device, engineers resort to aggressive quantization—dropping from FP16 to FP4—and aggressive pruning, which erodes accuracy. Consequently, on‑device AI is confined to narrow functions like image tagging, calendar suggestions, or voice transcription, while the bulk of generative workloads stay in data centers where compute and storage are abundant.

Privacy and trust add another layer to the edge versus cloud calculus. Storing personal data locally reduces exposure to cloud‑side breaches and regulatory uncertainty, a point manufacturers highlight when marketing NPUs. Yet the performance gap and developer friction—fragmented SDKs, rapid model turnover, and limited app adoption—slow the migration to truly private AI. As regulators tighten data‑use rules and consumers grow wary of cloud‑based assistants, the incentive to invest in more capable, power‑efficient NPUs may rise, but only if the ecosystem can deliver models that balance size, accuracy, and real‑world utility.

AI Pulse

The NPU in Your Phone Keeps Improving—Why Isn’t that Making AI Better?

Companies Mentioned

Why It Matters

Key Takeaways

Pulse Analysis

Ask Pulse AI: