Google's Gemini 3.5 Flash Follows Anthropic and OpenAI in Making Newer AI Models Significantly Pricier

•May 20, 2026

THE DECODER•May 20, 2026

Companies Mentioned

Google

GOOG

OpenAI

Google DeepMind

Anthropic

Why It Matters

Rising inference costs shift AI budgeting toward models that deliver more work per token, prompting enterprises to balance capability gains against higher operational spend.

Key Takeaways

•Gemini 3.5 Flash costs 5.5× Gemini 3 Flash, 75% more than Pro model
•Token prices rose to $1.50 input / $9 output per million tokens
•Agent tasks double token usage, driving total cost despite lower per‑token rates
•Hallucination rate drops to 61% but stays above leading models
•Coding score 45, ten points behind Gemini 3.1 Pro, limiting developer appeal

Pulse Analysis

Google’s Gemini 3.5 Flash arrives as the newest flagship in the Flash family, promising a one‑million‑token context window, video and audio inputs, and a 70 % speed boost over its predecessor. However, the model’s per‑token pricing jumped to $1.50 for input and $9 for output, and benchmark tests show it consumes roughly five times the tokens of Gemini 3 Flash on agent‑centric workloads. The result is a total cost that eclipses even the higher‑priced Gemini 3.1 Pro, highlighting a broader industry shift where providers raise both base rates and compute demand to fund more sophisticated, tool‑using AI behavior.

Performance-wise, Gemini 3.5 Flash climbs to an intelligence score of 55 on the Artificial Analysis Index, outpacing rivals such as Grok 4.3 and Claude Sonnet 4.6. Its multimodal MMMU‑Pro score hits a record‑breaking 84 %, and output throughput exceeds 280 tokens per second. Yet the model’s hallucination rate, though reduced to 61 %, lags behind best‑in‑class systems, and its coding index score of 45 falls ten points short of the Gemini 3.1 Pro preview. These mixed results illustrate the trade‑off between raw speed and specialized competence that enterprises must weigh.

For businesses, the higher cost structure forces a more nuanced ROI calculation. Simple, high‑volume tasks like code generation may still favor cheaper, older models, while complex, multi‑step agent workflows could justify the premium if they deliver measurable productivity gains. Companies are likely to adopt a tiered strategy—deploying Gemini 3.5 Flash for strategic, knowledge‑intensive use cases and retaining lower‑cost options for routine automation. As compute efficiency becomes the primary differentiator, vendors that can deliver more output per token will gain a competitive edge, and enterprises that monitor token utilization closely will better manage the escalating AI spend.

Google's Gemini 3.5 Flash follows Anthropic and OpenAI in making newer AI models significantly pricier

Read Original Article

Comments

Want to join the conversation?

Loading comments...