He Co-Invented the Transformer. Now: Continuous Thought Machines [Llion Jones / Luke Darlow]

•November 23, 2025

0

Machine Learning Street Talk

Machine Learning Street Talk•Nov 23, 2025

Why It Matters

The Continuous Thought Machine promises a fundamentally new AI architecture that could break the transformer monopoly, restoring research diversity and enabling more efficient, human‑like reasoning, which would impact both technological progress and the economics of AI development.

Summary

The video features Llion Jones, a co‑inventor of the Transformer architecture, discussing his shift away from transformer research toward a new paradigm he calls the Continuous Thought Machine (CTM). He explains that the transformer space has become oversaturated, prompting his company to explore adaptive‑compute recurrent models that draw on higher‑level neuronal concepts and synchronization mechanisms, aiming for more human‑like, biologically inspired reasoning. The CTM was unveiled at Europe’s 2025 spotlight and is positioned as a potential successor to the transformer era.

Jones critiques the current AI research climate, noting that the rapid commercial success of transformers has funneled funding and talent into incremental tweaks rather than fundamentally new architectures. He draws parallels to the RNN era, where once‑promising innovations were rendered obsolete by transformers, and warns that a similar stagnation may be occurring now. He argues that true breakthroughs require freedom from corporate and academic pressures, citing Kenneth Stanley’s philosophy of unfettered epistemic foraging and the need to protect researchers’ autonomy.

The conversation also touches on industry dynamics: large firms like OpenAI and Google are reluctant to adopt architectures that are merely better, demanding "crushingly" superior performance to justify abandoning the entrenched transformer ecosystem. Jones highlights the "technology capture" phenomenon, where commercial imperatives constrain exploratory work, and stresses that the CTM aims to integrate adaptive computation, uncertainty quantification, and reasoning intrinsically rather than as bolted‑on features. He warns that without such foundational shifts, the field risks becoming a basin of attraction that repeats past cycles of hype and redundancy.

If successful, the Continuous Thought Machine could re‑introduce architectural diversity, revitalize research freedom, and provide a more efficient, interpretable alternative to massive scaling of transformers. This would have broad implications for AI product development, talent recruitment, and the strategic direction of both startups and established labs, potentially reshaping the next wave of AI capabilities beyond sheer parameter count.

Original Description

The Transformer architecture (which powers ChatGPT and nearly all modern AI) might be trapping the industry in a localized rut, preventing us from finding true intelligent reasoning, according to the person who co-invented it. Llion Jones and Luke Darlow, key figures at the research lab Sakana AI, join the show to make this provocative argument, and also introduce new research (CTM) which might lead the way forwards.

We speak about "Inventor's Remorse" & The Trap of Success Despite being one of the original authors of the famous "Attention Is All You Need" paper that gave birth to the Transformer, Llion explains why he has largely stopped working on them. He argues that the industry is suffering from "success capture"—because Transformers work so well, everyone is focused on making small tweaks to the same architecture rather than discovering the next big leap.

*SPONSOR MESSAGES START*

—

Build your ideas with AI Studio from Google - http://ai.studio/build

—

Tufa AI Labs is hiring ML Research Engineers https://tufalabs.ai/

—

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst

—

*END*

The "Spiral" Problem – Llion uses a striking visual analogy to explain what current AI is missing. If you ask a standard neural network to understand a spiral shape, it solves it by drawing tiny straight lines that just happen to look like a spiral. It "fakes" the shape without understanding the concept of spiraling. They argue that today's AI models are similar—they are incredible at mimicking intelligent answers without having an internal process of "thinking".

Introducing the Continuous Thought Machine (CTM) Luke Darlow deep dives into their solution: a biology-inspired model that fundamentally changes how AI processes information.

The Maze Analogy: Luke explains that standard AI tries to solve a maze by staring at the whole image and guessing the entire path instantly. Their new machine "walks" through the maze step-by-step.

Thinking Time: This allows the AI to "ponder." If a problem is hard, the model can naturally spend more time thinking about it before answering, effectively allowing it to correct its own mistakes and backtrack—something current Language Models struggle to do genuinely.

The pair discuss the culture of Sakana AI, which is modeled after the early days of Google Brain/DeepMind. Llion nostalgically recalls that the Transformer wasn't born from a corporate mandate, but from random people talking over lunch about interesting problems.

https://sakana.ai/

https://x.com/YesThisIsLion

https://x.com/LearningLukeD

TRANSCRIPT:

https://app.rescript.info/public/share/crjzQ-Jo2FQsJc97xsBdfzfOIeMONpg0TFBuCgV2Fu8

TOC:

00:00:00 - Stepping Back from Transformers

00:00:43 - Introduction to Continuous Thought Machines (CTM)

00:01:09 - The Changing Atmosphere of AI Research

00:04:13 - Sakana’s Philosophy: Research Freedom

00:07:45 - The Local Minimum of Large Language Models

00:18:30 - Representation Problems: The Spiral Example

00:29:12 - Technical Deep Dive: CTM Architecture

00:36:00 - Adaptive Computation & Maze Solving

00:47:15 - Model Calibration & Uncertainty

01:00:43 - Sudoku Bench: Measuring True Reasoning

REFS:

Why Greatness Cannot be planned [Kenneth Stanley]

https://www.amazon.co.uk/Why-Greatness-Cannot-Planned-Objective/dp/3319155237

https://www.youtube.com/watch?v=lhYGXYeMq_E

The Hardware Lottery [Sara Hooker]

https://arxiv.org/abs/2009.06489

https://www.youtube.com/watch?v=sQFxbQ7ade0

Continuous Thought Machines [Luke Darlow et al / Sakana]

https://arxiv.org/abs/2505.05522

https://sakana.ai/ctm/

https://youtu.be/5X9cjGLggv0 great walkthrough of algo by Yacine Mahdid

LSTM: The Comeback Story? [Prof. Sepp Hochreiter]

https://www.youtube.com/watch?v=8u2pW2zZLCs

Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis [Kumar/Stanley]

https://arxiv.org/pdf/2505.11581

Intelligent Matrix Exponentiation [Thomas Fischbacher] (Spiral reference)

https://arxiv.org/abs/2008.03936

A Spline Theory of Deep Networks [Randall Balestriero]

https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf

https://www.youtube.com/watch?v=86ib0sfdFtw

https://www.youtube.com/watch?v=l3O2J3LMxqI

On the Biology of a Large Language Model [Anthropic, Jack Lindsey et al]

https://transformer-circuits.pub/2025/attribution-graphs/biology.html

The ARC Prize 2024 Winning Algorithm [Daniel Franzen and Jan Disselhoff] “The ARChitects”

https://www.youtube.com/watch?v=mTX_sAq--zY

Neural Turing Machine [Graves]

https://arxiv.org/pdf/1410.5401

Adaptive Computation Time for Recurrent Neural Networks [Graves]

https://arxiv.org/abs/1603.08983

Sudoko Bench [Sakana]

https://pub.sakana.ai/sudoku/

0

Comments

Want to join the conversation?

Loading comments...