Chip Design From the Bottom up – Reiner Pope

Dwarkesh Patel
Dwarkesh PatelMay 22, 2026

Why It Matters

Understanding the low‑level building blocks of chips clarifies why data movement, architecture choice, and specialization drive AI compute costs, informing both engineers and investors about future hardware investments.

Key Takeaways

  • Logic gates combine to form multiply‑accumulate units
  • Data movement cost dominates performance beyond raw compute
  • Systolic arrays enable efficient tensor operations for AI workloads
  • FPGAs offer flexibility, ASICs deliver higher volume efficiency
  • Brain architecture mirrors distributed, low‑precision processing of chips

Pulse Analysis

Starting a chip design conversation at the transistor level may seem academic, but it reveals the true sources of performance and cost. Pope’s step‑by‑step construction of a multiply‑accumulate circuit shows how a handful of NAND gates evolve into the arithmetic cores that power modern AI accelerators. By emphasizing the cost of moving bits across a die, he underscores a shift in optimization focus: engineers now prioritize bandwidth and latency reductions just as much as raw FLOPS, a reality that reshapes silicon roadmaps across the industry.

The lecture’s middle section demystifies why GPUs, TPUs, and emerging systolic arrays look the way they do. Systolic arrays, a hallmark of Google’s TPU, line up processing elements to stream data with minimal shuffling, delivering 5‑15% lower total cost of ownership in cloud environments, according to recent SemiAnalysis reports. Pope explains that GPUs achieve similar throughput by replicating many small TPUs, while FPGAs provide reconfigurable pipelines for niche workloads. This architectural convergence means AI developers can choose between flexibility and efficiency without sacrificing the core tensor‑processing capabilities that drive deep‑learning performance.

Finally, Pope draws a provocative analogy between silicon and the human brain, noting that both rely on massive parallelism and low‑precision arithmetic to conserve energy. This perspective fuels the next wave of neuromorphic and in‑memory computing research, where chips emulate synaptic behavior to push beyond the limits of traditional von Neumann designs. MatX, Pope’s new venture, aims to capitalize on these insights by delivering custom accelerators that blend TPU‑style systolic efficiency with FPGA‑style adaptability, positioning the startup at the forefront of a market projected to exceed $150 billion by 2030. The lecture thus offers a roadmap for investors and technologists seeking to navigate the rapidly evolving hardware landscape.

Original Description

New blackboard lecture with Reiner Pope: how do chips actually work - starting with basic logic gates, and working up to why GPUs, TPUs, FPGAs, and the human brain each look the way they do.
Reiner is CEO of MatX, a new chip startup (full disclosure - I’m an angel investor). He was previously at Google, where he worked on software efficiency, compilers, and TPU architecture.
𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒
𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒
* Crusoe was one of only five GPU clouds that made the gold tier in SemiAnalysis' most recent ClusterMAX report. Gold-tier providers like Crusoe delivered 5-15% lower TCO than silver-tier clouds, even with identical GPU pricing. This is because optimizations like early fault detection and rapid node replacement don't necessarily show up in the sticker price, but still matter a ton in the real world. Learn more at https://crusoe.ai/dwarkesh
* Cursor is where I do most of my work—from reading research papers to visualizing technical concepts to coding up internal tools for the podcast. Most recently, I used it to build two different review interfaces for my essay contest, one that anonymizes submissions for scoring and another that lets me see applicants' essays next to their resumes and websites. Whatever you're working on, you should try doing it in Cursor. Get started at https://cursor.com/dwarkesh
* Jane Street let me ask Ron Minsky and Dan Pontecorvo, two senior Jane Streeters, a bunch of questions about how they use AI. We discussed everything from the types of models they're training to how they think about the future of trading to why they're more bullish than ever on hiring technical talent. You can watch the full conversation and learn more about their open positions at https://janestreet.com/dwarkesh
𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒
00:00:00 – Building a multiply-accumulate from logic gates
00:16:20 – Muxes and the cost of data movement
00:25:59 – How systolic arrays work
00:39:00 – Clock cycles and pipeline registers
00:51:40 – FPGAs vs ASICs
01:03:14 – Cache vs scratchpad
01:07:16 – Why CPU cores are much bigger than GPU cores
01:11:49 – Brains vs chips
01:15:22 – A GPU is just a bunch of tiny TPUs

Comments

Want to join the conversation?

Loading comments...