Ilya Sutskever – We're Moving From the Age of Scaling to the Age of Research

•November 25, 2025

0

Dwarkesh Patel

Dwarkesh Patel•Nov 25, 2025

Why It Matters

If true, current evaluation and development practices could be producing models that look powerful in labs but underperform in practice, with implications for investment priorities, deployment risk, and how companies and regulators judge progress. Shifting focus to research that improves transfer and real-world robustness will shape where capital and talent flow next.

Summary

OpenAI cofounder Ilya Sutskever argues the field is shifting from an era of pure scaling to one dominated by targeted research, noting a paradox: models score exceptionally on benchmarks yet their real-world economic impact remains muted. He suggests this gap may stem from reinforcement-learning fine-tuning that overfits to evaluation tasks or from inadequate generalization despite vast pretraining data. Sutskever uses a competitive-programming analogy to illustrate how narrow, intensive training can produce superhuman test performance without broader judgment or transferability. He urges developing richer training environments or methods that enable learning to generalize across tasks rather than optimize for benchmarks alone.

Original Description

Ilya & I discuss SSI’s strategy, the problems with pre-training, how to improve the generalization of AI models, and how to ensure AGI goes well.

𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒

* Transcript: https://www.dwarkesh.com/p/ilya-sutskever-2

* Apple Podcasts: https://podcasts.apple.com/us/podcast/dwarkesh-podcast/id1516093381?i=1000738363711

* Spotify: https://open.spotify.com/episode/7naOOba8SwiUNobGz8mQEL?si=39dd68f346ea4d49

𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒

- Gemini 3 is the first model I’ve used that can find connections I haven’t anticipated. I recently wrote a blog post on RL’s information efficiency, and Gemini 3 helped me think it all through. It also generated the relevant charts and ran toy ML experiments for me with zero bugs. Try Gemini 3 today at https://gemini.google

- Labelbox helped me create a tool to transcribe our episodes! I’ve struggled with transcription in the past because I don’t just want verbatim transcripts, I want transcripts reworded to read like essays. Labelbox helped me generate the exact data I needed for this. If you want to learn how Labelbox can help you (or if you want to try out the transcriber tool yourself), go to https://labelbox.com/dwarkesh

- Sardine is an AI risk management platform that brings together thousands of device, behavior, and identity signals to help you assess a user’s risk of fraud & abuse. Sardine also offers a suite of agents to automate investigations so that as fraudsters use AI to scale their attacks, you can use AI to scale your defenses. Learn more at https://sardine.ai/dwarkesh

To sponsor a future episode, visit https://dwarkesh.com/advertise

𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒

00:00:00 – Explaining model jaggedness

00:09:39 - Emotions and value functions

00:18:49 – What are we scaling?

00:25:13 – Why humans generalize better than models

00:35:45 – Straight-shotting superintelligence

00:46:47 – SSI’s model will learn from deployment

00:55:07 – Alignment

01:18:13 – “We are squarely an age of research company”

01:29:23 -- Self-play and multi-agent

01:32:42 – Research taste

0

Comments

Want to join the conversation?

Loading comments...