The Leaderboard 'You Can't Game,' Funded by the Companies It Ranks | Equity Podcast

TechCrunch
TechCrunchMar 18, 2026

Why It Matters

Arena’s live, user‑driven leaderboard provides the most reliable benchmark for frontier AI, shaping product choices and investment flows while forcing developers to improve genuine utility rather than test‑set performance.

Key Takeaways

  • Arena offers real‑world, continuously refreshed LLM evaluation data.
  • Platform avoids overfitting by leveraging millions of user interactions.
  • Funding from top AI labs raises neutrality concerns, but architecture claims independence.
  • Diverse user base spans coding, legal, medical, and creative tasks.
  • Open‑source pipeline provides reproducible leaderboards with confidence intervals.

Summary

The Equity podcast episode spotlights Arena, the de‑facto public leaderboard that ranks frontier large language models (LLMs) and emerging AI agents. Founded by former Berkeley PhDs Anastasios Angelopoulos and Wayland Shen, the platform evolved from a research prototype called Chatbot Arena into a venture‑backed company now valued at $1.7 billion.

Arena differentiates itself by collecting tens of millions of real‑world user interactions rather than relying on static test sets. Each day hundreds of thousands of conversations—spanning coding, legal, medical, marketing and more—feed pairwise preference data that the open‑source pipeline converts into a continuously updating leaderboard. This dynamic feed prevents overfitting and yields statistical confidence intervals that converge as data volume grows.

The hosts cite that 28 % of users are coding, half engage in software‑engineering or creative work, and 6 % each perform legal and medical tasks, illustrating a broad, economically valuable user base. Arena’s neutrality is built into its architecture: models must be publicly available, scores are generated automatically from user votes, and no money can buy placement on the public leaderboard.

For developers, investors and enterprise customers, Arena offers a trusted signal of which model delivers real‑world utility, influencing funding decisions, product launches and PR cycles. Its growth also raises governance questions about bias, demographic representation, and the influence of backers who are also competitors, making transparent, reproducible evaluation a strategic imperative for the AI ecosystem.

Original Description

Artificial intelligence models are multiplying fast, and competition is stiff. With so many players crowding the space, which one will be the best — and who decides that? Arena, formerly LM Arena, has emerged as the de facto public leaderboard for frontier LLMs, influencing funding, launches, and PR cycles. In just seven months, the startup went from a UC Berkeley PhD research project to being valued at $1.7 billion. 
Watch as Equity host Rebecca Bellan catches up with Arena co-founders Anastasios Angelopoulos and Wei-Lin Chiang about how their platform became the go-to leaderboard for frontier AI models, and how they’re trying to build a neutral benchmark even as companies like OpenAI, Google, and Anthropic back the project.
Subscribe to Equity on YouTube, Apple Podcasts, Overcast, Spotify and all the casts. You also can follow Equity on X and Threads, at @EquityPod.
Chapters:
00:00 Intro
03:00 How Arena's leaderboard works, and why it's different from static benchmarks
07:00 Reproducibility concerns and how to scale
08:45 Can Arena stay independent while taking money from the labs it ranks?
11:15 Diversity, fraud prevention, and abuse mitigation
18:15 Arena's "data moat"
19:20 Agent benchmarking and expert leaderboards
21:40 Open sourcing data
22:45 How do Arena's rankings shape AI development?
24:15 Outro

Comments

Want to join the conversation?

Loading comments...