Zoom Says It Aced AI’s Hardest Exam. Critics Say It Copied Off Its Neighbors.

•December 16, 2025

VentureBeat•Dec 16, 2025

Companies Mentioned

Google

GOOG

OpenAI

Anthropic

Microsoft

MSFT

Kaggle

Sierra

Why It Matters

Zoom’s result challenges the industry’s focus on proprietary model development and suggests orchestration could become a competitive advantage for enterprise AI.

Key Takeaways

•Zoom used federated AI, not own LLM.
•Scored 48.1% on Humanity's Last Exam, beating Gemini 3 Pro.
•Approach sparked debate over what counts as AI innovation.
•Orchestration layer reduces vendor lock‑in for enterprise customers.
•Critics say claim masks reliance on external models.

Pulse Analysis

The Humanity's Last Exam benchmark is designed to test deep reasoning, multi‑step problem solving, and cross‑domain knowledge—tasks that have long eluded even the most advanced large language models. Zoom’s claim of a 48.1% score, a 2.3‑point gain over Google’s previous best, is noteworthy not because it reflects a new model, but because it demonstrates how a federated architecture can extract incremental performance from existing APIs. By deploying a Z‑scorer to evaluate and combine outputs, Zoom effectively creates an ensemble that outperforms any single provider on this specific test.

Reactions in the AI community have been sharply divided. Some engineers praise the engineering elegance of model federation, likening it to Kaggle ensembles that routinely win competitions. Others contend that the achievement is more marketing than innovation, arguing that credit should go to the underlying models rather than the orchestration layer. This controversy underscores a broader question: should benchmark leadership be awarded to the creator of the model, the integrator, or both? As AI evaluation matures, the industry may need new metrics that recognize system‑level ingenuity alongside raw model capability.

Strategically, Zoom’s approach signals a shift toward AI as a service‑orchestration platform rather than a pure model‑building play. By abstracting away vendor dependencies, Zoom can swap in newer models—such as OpenAI’s upcoming releases—without rebuilding its product stack, offering customers the best available capabilities while avoiding lock‑in. If the company can translate benchmark gains into tangible productivity tools for its 300 million users, the federated model could become a template for other enterprise software firms seeking to leverage AI without the massive R&D spend required for frontier model development.

Zoom Says It Aced AI’s Hardest Exam. Critics Say It Copied Off Its Neighbors.

Companies Mentioned

Why It Matters

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Comments

AI Pulse