Inception Launches Mercury 2, the First Diffusion-Based Language Reasoning Model

•February 24, 2026

THE DECODER•Feb 24, 2026

Why It Matters

Mercury 2’s diffusion architecture could reshape cost structures for real‑time AI services, giving enterprises a faster, cheaper alternative to transformer‑based models. Its early adoption may accelerate the shift toward non‑transformer architectures in the competitive LLM market.

Key Takeaways

•Mercury 2 processes 1,009 tokens/second, 1.7 s latency.
•Pricing $0.25/$0.75 per million tokens, cheaper than rivals.
•Diffusion approach refines multiple text blocks simultaneously.
•128K context, tool use, JSON output supports enterprise apps.
•Benchmarks show competitive scores across GPQA, SciCode, IFBench.

Pulse Analysis

The emergence of diffusion‑based language models marks a departure from the transformer paradigm that has dominated natural‑language processing for years. By treating text generation as a denoising process, diffusion models can update entire sequences in parallel, similar to how image diffusion reconstructs pixels. This architectural shift enables substantial reductions in inference steps, translating into lower latency and compute demand. For developers, the ability to reason over large contexts without the sequential bottleneck opens new possibilities in real‑time applications such as voice assistants, interactive coding environments, and dynamic search interfaces.

Mercury 2 leverages this principle to deliver 1,009 tokens per second on Nvidia Blackwell GPUs, achieving an end‑to‑end latency of just 1.7 seconds—far quicker than Gemini 3 Flash’s 14.4 seconds and Claude Haiku 4.5’s 23.4 seconds when reasoning is enabled. The model’s pricing of $0.25 per million input tokens and $0.75 per million output tokens undercuts the same services by up to 75 percent, making high‑throughput AI economically viable for startups and large enterprises alike. Coupled with a 128K context window, tool integration, and native JSON output, Mercury 2 is positioned for latency‑sensitive workloads that previously required costly infrastructure.

The launch arrives as major players such as Google DeepMind experiment with diffusion‑based LLMs, but Inception Labs is the first to commercialize the approach at scale. If Mercury 2’s benchmark performance—strong scores on GPQA Diamond, SciCode, and IFBench—holds across broader tasks, it could pressure incumbent transformer providers to accelerate research into alternative architectures. Investors have already signaled confidence, with a recent $50 million round led by Microsoft, Nvidia, and Snowflake. As the industry watches, diffusion models may become a viable third pillar alongside transformers and retrieval‑augmented generation, reshaping the competitive dynamics of the AI market.