Inception Labs' Mercury 2 AI Outperforms Google's DiffusionGemma on Key Benchmarks

Created at 21 Jun · 4:06 PM1 source↑ Market-relevant

IN SHORT

Inception Labs' Mercury 2 AI model has demonstrated superior performance on mathematical and science reasoning benchmarks compared to Google's DiffusionGemma, despite both models utilizing similar parallel generation techniques for speed. Mercury 2 achieved a higher score on the AIME 2026 mathematics test and a near-tie on the GPQA science benchmark.

Key Numbers

1,000 tokens/secMercury 2 generation speed

90%Mercury 2 AIME 2026 score

69.1%DiffusionGemma AIME 2026 score

77%Mercury 2 GPQA score

73.2%DiffusionGemma GPQA score

89 tokens/secAnthropic Claude Haiku 4.5 Reasoning speed

71 tokens/secOpenAI GPT-5 Mini speed

82%Latency drop with Mercury 2 in Augment Code

90%Cost cut with Mercury 2 in Augment Code

Who's Involved

Inception Labs

Developer of the Mercury 2 AI model

Google

Developer of the DiffusionGemma AI model

Stefano Ermon

Founder of Inception Labs and Stanford professor

Nvidia

Venture arm investor in Inception Labs

↳ Why This Matters

The development signifies a leap in AI reasoning speed and performance, potentially lowering costs and improving user experience for AI applications. Mercury 2's superior benchmark scores over Google's DiffusionGemma highlight the competitive landscape in advanced AI model development and the growing importance of parallel processing techniques.

Key facts

Inception Labs' Mercury 2 AI model generates approximately 1,000 tokens per second.

Mercury 2 scored 90% on the AIME 2026 mathematics benchmark.

Google's DiffusionGemma scored 69.1% on the same AIME 2026 benchmark.

Mercury 2 achieved a 77% score on the GPQA benchmark, compared to DiffusionGemma's 73.2%.

Mercury 2 is a paid, closed-weight API model, while DiffusionGemma is free and open-weight.

Inception Labs has launched its Mercury 2 AI model, which it claims is the world's fastest reasoning language model, capable of generating approximately 1,000 tokens per second. This speed places it in a similar performance bracket to Google's recently announced DiffusionGemma. Both models utilize parallel generation techniques, diverging from traditional sequential processing, to achieve higher speeds.

However, Mercury 2 has demonstrated superior performance on key benchmarks. On the AIME 2026 mathematics test, Mercury 2 achieved a score of 90%, significantly outperforming Google's DiffusionGemma, which scored 69.1%. On the GPQA, a PhD-level science benchmark, Mercury 2 scored 77% compared to DiffusionGemma's 73.2%. Google's own documentation suggests its standard Gemma 4 model performs better than DiffusionGemma on quality metrics.

Independent evaluations, such as a case study with AI coding-agent company Augment Code, show Mercury 2 offering substantial improvements in latency and cost reduction when used as a replacement for other models, while maintaining output quality. Inception Labs, founded by Stanford professor Stefano Ermon, has secured backing from notable investors including Nvidia's venture arm, Andrew Ng, and Andrej Karpathy.

The parallel diffusion approach allows AI systems to feel more responsive, enabling rapid iterations and efficient operation of multiple specialized AI agents within a larger system. While Mercury 2 is a closed-weight, API-based model and its ecosystem is still developing, its performance on commodity GPUs suggests significant potential for cost and energy savings at scale, particularly for speed-sensitive applications like real-time coding and voice interfaces.

Frequently asked questions

Mercury 2's primary advantage is its speed, generating approximately 1,000 tokens per second, and its strong performance on reasoning benchmarks like AIME 2026 and GPQA.

While both use parallel generation, Mercury 2 significantly outperforms DiffusionGemma on key reasoning benchmarks and is a paid, closed-weight API model, whereas DiffusionGemma is free and open-weight.

Parallel generation involves filling a block of text with placeholder tokens and then refining it across multiple passes, similar to how image generators create pictures from static, rather than writing word-by-word.

Inception Labs' Mercury 2 AI Outperforms Google's DiffusionGemma on Key Benchmarks

Key Numbers

Who's Involved

Inception Labs' Mercury 2 AI Outperforms Google's DiffusionGemma on Key Benchmarks

Key Numbers

Who's Involved

↳ Why This Matters

Key facts

Frequently asked questions

What Happens Next

Get the newsletter.

How It Developed

Sources

Related Stories

Inception Labs' Mercury 2 AI Outperforms Google's DiffusionGemma on Key Benchmarks

PiQ Daily

Key Numbers

Who's Involved

Inception Labs' Mercury 2 AI Outperforms Google's DiffusionGemma on Key Benchmarks

PiQ Daily

Key Numbers

Who's Involved

↳ Why This Matters

Key facts

Frequently asked questions

+ What is the main advantage of Mercury 2?

+ How does Mercury 2 differ from Google's DiffusionGemma?

+ What is parallel generation in AI models?

What Happens Next

Get the newsletter.

How It Developed

Sources

Related Stories