Google Cloud C4 Brings a 70% TCO Improvement on GPT OSS with Intel and Hugging Face
Why It Matters
The result makes large‑scale CPU inference more cost‑effective—potentially shifting cloud LLM economics, accelerating broader deployment of open models, and intensifying competitive pressure on instance pricing and architecture choices.
Summary
Intel and Hugging Face benchmarked OpenAI’s GPT OSS on Google Cloud’s new C4 VMs (Intel Xeon 6/Granite Rapids) and report a 1.7x improvement in total cost of ownership versus prior-generation C3 instances. The C4 machines delivered 1.4x–1.7x better throughput per vCPU per dollar and lower hourly prices in steady-state text‑generation tests using the unsloth/gpt-oss-120b-BF16 model with bfloat16 precision and optimized MoE execution. The result makes large‑scale CPU inference more cost‑effective—potentially shifting cloud LLM economics, accelerating broader deployment of open models, and intensifying competitive pressure on instance pricing and architecture choices.
Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face
Comments
Want to join the conversation?
Loading comments...