
AI IQ Is Here: A New Site Scores Frontier AI Models on the Human IQ Scale. The Results Are Already Dividing Tech.
Companies Mentioned
Why It Matters
The framework gives enterprise buyers a single, comparable view of model performance and cost, accelerating decision‑making, while the debate underscores the need for more nuanced, transparent AI evaluation methods.
Key Takeaways
- •OpenAI's GPT‑5.5 tops AI IQ chart at estimated IQ 136
- •Anthropic's Opus 4.7 leads EQ ranking near 132
- •Mid‑tier models deliver IQ 112‑120 for $1‑5 per task
- •Critics warn a single IQ number masks uneven model capabilities
Pulse Analysis
The AI IQ platform attempts to bring the familiar language of human intelligence testing to the fragmented world of large‑language‑model benchmarking. By aggregating twelve diverse tests into four reasoning dimensions—abstract, mathematical, programmatic and academic—the site produces a composite IQ score that can be compared across providers. This approach resonates with technologists who struggle to interpret sprawling leaderboards, offering a quick visual cue of where a model sits on the intelligence spectrum. However, the methodology relies on hand‑calibrated difficulty curves and conservative handling of missing data, which some researchers argue introduces opacity and potential bias.
Beyond raw intelligence, AI IQ’s inclusion of an emotional‑intelligence (EQ) metric adds a new layer of relevance for conversational and collaborative applications. Anthropic’s Opus 4.7 currently occupies the upper‑right quadrant of the IQ‑vs‑EQ scatter plot, indicating strong performance in both cognitive and affective domains. The EQ scores are derived from a blend of model‑generated Elo ratings and human‑judged arena matches, with a corrective penalty applied to Anthropic models to mitigate self‑scoring bias. This dual‑axis view helps enterprises prioritize models that not only solve complex problems but also maintain user trust and engagement.
Perhaps the most actionable insight for CIOs is the IQ‑vs‑effective‑cost chart, which maps model intelligence against the token cost of a 2 million‑input/1 million‑output task. While top‑tier models like GPT‑5.5 and Opus 4.7 command $30‑$50 per task, a cluster of mid‑range models—including DeepSeek‑V3.2 and MiniMax‑M2.7—offer IQ scores in the low‑120s for as little as $1‑$5. This cost‑performance compression suggests a strategic shift toward model orchestration: deploying high‑cost, high‑IQ models for niche challenges while routing routine workloads to cheaper, sufficiently capable alternatives. The AI IQ site, despite its methodological debates, provides a rare, consolidated lens for navigating this evolving landscape.
AI IQ is here: a new site scores frontier AI models on the human IQ scale. The results are already dividing tech.
Comments
Want to join the conversation?
Loading comments...