Google Cloud C4 Brings a 70% TCO Improvement on GPT OSS with Intel and Hugging Face

Google Cloud C4 Brings a 70% TCO Improvement on GPT OSS with Intel and Hugging Face

Hugging Face
Hugging FaceOct 16, 2025

Why It Matters

The result makes large‑scale CPU inference more cost‑effective—potentially shifting cloud LLM economics, accelerating broader deployment of open models, and intensifying competitive pressure on instance pricing and architecture choices.

Summary

Intel and Hugging Face benchmarked OpenAI’s GPT OSS on Google Cloud’s new C4 VMs (Intel Xeon 6/Granite Rapids) and report a 1.7x improvement in total cost of ownership versus prior-generation C3 instances. The C4 machines delivered 1.4x–1.7x better throughput per vCPU per dollar and lower hourly prices in steady-state text‑generation tests using the unsloth/gpt-oss-120b-BF16 model with bfloat16 precision and optimized MoE execution. The result makes large‑scale CPU inference more cost‑effective—potentially shifting cloud LLM economics, accelerating broader deployment of open models, and intensifying competitive pressure on instance pricing and architecture choices.

Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face

Comments

Want to join the conversation?

Loading comments...