Grok 4 - 10 New Things to Know

AI Explained
AI ExplainedJul 10, 2025

Why It Matters

If Grok 4’s benchmark gains translate to real-world use, it could shift enterprise and developer choices and intensify competition among model providers—but cost, hallucination risk and uneven multimodal performance mean buyers must assess actual utility, not just headline scores.

Summary

XAI’s Grok 4 debuts as a top-performing large language model, outperforming rival models on several academic, coding and fluid-intelligence benchmarks and scoring particularly well on the semi-private ARC AGI2 test. Elon Musk and XAI tout “postgraduate/PhD-level” performance, but the presenter cautions this is benchmark-dependent, prone to hallucinations, and sometimes slow or weaker on visual tasks. Grok 4’s Heavy variant uses parallel agent “study group” reasoning to boost results, and a premium Super Grok Heavy tier is priced at $300/month with planned features like video generation. Benchmarks are also criticized for selective comparisons and scale exaggeration, so practical superiority and value versus cheaper alternatives such as Gemini Pro remain uncertain.

Original Description

Grok 4 is here, but did you know these 10 things about the new model? From benchmark caveats to soloing science, $300 a month secrets to Grok 5 promises, here's 10 new things to know in just under 12 minutes.
Chapters:
00:00 - Introduction
00:22 - Benchmark Results
02:11 - Benchmark Caveats
02:59 - ARC-AGI 2
03:35 - SimpleBench
04:49 - ‘Humanity’s Last Exam’
07:20 - SuperGrok Heavy Price
07:58 - API Price
08:12 - Grok 5, Gemini 3.0 Beta, GPT-5
09:12 - System Prompt Change + $1B a month, pollution
10:20 - Not soloing science, helping you solo code
Humanity’s Last Exam: https://agi.safe.ai/

Comments

Want to join the conversation?

Loading comments...