Google Cloud C4 Brings a 70% TCO Improvement on GPT OSS with Intel and Hugging Face

•October 16, 2025

Hugging Face•Oct 16, 2025

Why It Matters

The result makes large‑scale CPU inference more cost‑effective—potentially shifting cloud LLM economics, accelerating broader deployment of open models, and intensifying competitive pressure on instance pricing and architecture choices.

Summary

Intel and Hugging Face benchmarked OpenAI’s GPT OSS on Google Cloud’s new C4 VMs (Intel Xeon 6/Granite Rapids) and report a 1.7x improvement in total cost of ownership versus prior-generation C3 instances. The C4 machines delivered 1.4x–1.7x better throughput per vCPU per dollar and lower hourly prices in steady-state text‑generation tests using the unsloth/gpt-oss-120b-BF16 model with bfloat16 precision and optimized MoE execution. The result makes large‑scale CPU inference more cost‑effective—potentially shifting cloud LLM economics, accelerating broader deployment of open models, and intensifying competitive pressure on instance pricing and architecture choices.

Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face

Read Original Article

Comments

Want to join the conversation?

Loading comments...

Google Cloud C4 Brings a 70% TCO Improvement on GPT OSS with Intel and Hugging Face

Why It Matters

Summary

Ask Pulse AI:

Comments

AI Pulse

Top Publishers

Top Creators

Top Companies

Top Investors