DeepSeek's New Models Are so Efficient They'll Run on a Toaster ... By Which We Mean Huawei's NPUs

DeepSeek's New Models Are so Efficient They'll Run on a Toaster ... By Which We Mean Huawei's NPUs

The Register – AI/ML (data-related)
The Register – AI/ML (data-related)Apr 24, 2026

Why It Matters

By slashing inference memory and cost, DeepSeek V4 makes high‑end LLM capabilities accessible to smaller enterprises and accelerates adoption of non‑Western AI hardware, reshaping competitive dynamics in the generative AI market.

Key Takeaways

  • DeepSeek V4 offers 284B‑parameter Flash MoE and 1.6T‑parameter Pro models.
  • Hybrid attention cuts memory 9.5‑13.7×, enabling million‑token context windows.
  • FP8/FP4 quantization reduces weight storage, lowering inference cost.
  • API pricing starts at $0.14 per million input tokens, undercutting OpenAI.

Pulse Analysis

DeepSeek’s V4 arrival marks a pivotal moment for the open‑weights AI movement, which has long chased the performance of proprietary giants such as OpenAI and Anthropic. Building on the reputation earned by its V3 and R1 series, DeepSeek now offers two distinct model sizes—a 284‑billion‑parameter Flash mixture‑of‑experts (MoE) and a 1.6‑trillion‑parameter Pro variant—both released on Hugging Face and via the company’s API. By making these models publicly downloadable, DeepSeek reinforces the trend toward democratized large language models that can be fine‑tuned or deployed in‑house without licensing barriers.

The technical leap stems from a hybrid attention architecture that blends Compressed Sparse Attention with Heavy Compressed Attention, slashing the KV‑cache memory footprint by 9.5‑13.7× and enabling a one‑million‑token context window. Coupled with a mixed‑precision regime of FP8 for most weights and FP4 for expert parameters, the model halves storage requirements compared with its predecessor. These efficiencies are amplified by a new optimizer, Muon, which accelerates convergence, and by validation on Huawei’s Ascend NPUs as well as Nvidia GPUs, broadening hardware options for cost‑sensitive deployments.

From a business perspective, DeepSeek’s pricing—$0.14 per million input tokens for the smaller model and $1.74 for the Pro version—under‑cuts OpenAI’s $5‑$30 range, positioning V4 as an attractive alternative for startups and enterprises seeking scalable AI without prohibitive cloud bills. The ability to run on Huawei accelerators also opens a pathway for customers in regions where Western chips are restricted, potentially reshaping the global AI supply chain. However, real‑world performance and ecosystem support will determine whether the cost advantage translates into sustained market share.

DeepSeek's new models are so efficient they'll run on a toaster ... by which we mean Huawei's NPUs

Comments

Want to join the conversation?

Loading comments...