Groq, Etched, SambaNova, Taalas // The AI Hardware Show S2E4

TechTechPotato (Ian Cutress)
TechTechPotato (Ian Cutress)Apr 6, 2026

Why It Matters

Specialized inference chips promise dramatically lower latency and cost, forcing data‑center operators and investors to rethink the balance between flexibility and performance in AI deployments.

Key Takeaways

  • Groq’s LPU architecture offers deterministic inference via on‑chip SRAM.
  • Etched’s SOHU ASIC trades flexibility for transformer‑only speed advantage.
  • New chips’ Raptor targets low‑latency enterprise inference with moderate throughput.
  • Samanova’s SN40L combines massive SRAM and DDR for trillion‑parameter models.
  • Talis and Posetron pursue extreme specialization using model‑compiled silicon and FPGA.

Summary

The AI Hardware Show episode dives deep into the rapidly evolving LLM inference market, profiling a suite of startups that are redefining data‑center acceleration. Hosts Sally Ward Foxton and Ian Cutras outline why inference at scale is the next cash‑flow engine, noting that dozens of unicorns are racing to lock down deterministic performance, power efficiency, and cost advantages. Key insights include Groq’s Language Processing Unit, a 14 nm chip that eliminates caches, DRAM and out‑of‑order execution to guarantee compile‑time latency, and its upcoming 4 nm, stacked‑DRAM successor funded by a $700 million Series D. Etched’s SOHU ASIC, built on TSMC’s 4 nm node, forgoes all flexibility to run transformers exclusively, claiming 500 k Llama 70B tokens per second—an order of magnitude ahead of Nvidia’s Blackwell. Meanwhile, New chips’ Raptor accelerator balances modest 8‑10 tps per chip latency with on‑device vector search, targeting enterprise workloads where power and latency trump raw throughput. Samanova’s SN40L leverages a coarse‑grained reconfigurable array, 520 MB SRAM and 64 GB HPM to serve multi‑trillion‑parameter models with micro‑second model‑switching, sold as a fully integrated rack. Talis bets on a “hard‑core model‑as‑silicon” approach, recompiling each model onto a custom chip for thousand‑fold efficiency gains, while Posetron’s FPGA‑based Atlas card promises 70 % faster token rates than Nvidia Hopper by exploiting HBM‑enabled Altera Agile FPGAs. Notable quotes underscore the stakes: Groq’s acquisition by Nvidia was announced on Christmas Eve 2025, Etched’s CEO admits, “If transformers lose, we lose,” and Talis’s founder emphasizes eliminating every runtime abstraction. Posetron’s founders, former Groq engineers, tout 93 % memory‑bandwidth utilization on DDR‑only ASICs as a path to competitive performance without HBM. These anecdotes illustrate the spectrum from ultra‑flexible CPUs to single‑purpose ASICs, each carving a niche in the inference hierarchy. The implications are clear: investors must choose between flexibility and peak efficiency, while hyperscalers weigh deterministic latency against the risk of architectural lock‑in. As power‑hungry GPUs approach diminishing returns, specialized silicon—whether deterministic LPUs, transformer‑only ASICs, or model‑compiled chips—could reshape AI infrastructure economics, driving down cost per token and enabling new edge‑centric generative applications.

Original Description

Now we dive into the high-stakes world of LLM inference at the data center scale. From the "LPU" architecture of Groq, recorded just before its surprise acquisition by NVIDIA, to the radical transformer-only ASIC strategy from Etched. We explore how startups like Taalas are hardening specific models directly into silicon for 1000x efficiency gains, how Neuchips is finding its niche where GPUs are overkill, and why SambaNova remains the sole champion of RDU/CGRA architectures in the data center. Finally, we look at Positron, the stealthy newcomer utilizing Intel FPGAs to outperform NVIDIA Hopper systems while preparing a DDR-only ASIC for 2026.
Want to see more from Sally? Check out her recent industry coverage on EE Times: https://www.eetimes.com/author/sally-ward-foxton/?utm_source=ian_youtube&utm_medium=social&utm_campaign=aihardwareshow
Stay up to date on the latest news in the electronics industry with the EE Times newsletter. Subscribe Now: https://aspencore.dragonforms.com/loading.do?version=0&page=1&omedasite=EET_Subscribe_2025&pk=aihardwareshow
[00:00] Intro
[00:42] Groq One
[03:38] Etched Sohu
[06:10] Neuchips Raptor
[08:39] SambaNova SN40L
[11:01] Taalas HardCore
[13:34] Positron Atlas
-----------------------
Need POTATO merch? There's a chip for that!
http://more-moore.com : Sign up to the More Than Moore Newsletter
https://www.patreon.com/TechTechPotato : Patreon gets you access to the TTP Discord server!
Follow Ian on Twitter at http://twitter.com/IanCutress
Follow TechTechPotato on Twitter at http://twitter.com/TechTechPotato
If you're in the market for something from Amazon, please use the following links. TTP may receive a commission if you purchase anything through these links.
-----------------------
Welcome to the TechTechPotato (c) Dr. Ian Cutress
Ramblings about things related to Technology from an analyst for More Than Moore
#techtechpotato #eetimes #aihardwareshow
------------
More Than Moore, as with other research and analyst firms, provides or has provided paid research, analysis, advising, or consulting to many high-tech companies in the industry, which may include advertising on the More Than Moore newsletter or TechTechPotato YouTube channel and related social media. The companies that fall under this banner include AMD, Applied Materials, Arm, Armari, ASM, Ayar Labs, Baidu, Bolt Graphics, Dialectica, Facebook, GLG, Guidepoint, IBM, Impala, Infineon, Intel, Kuehne+Nagel, Lattice Semi, Linode, MediaTek, NeuReality, NextSilicon, NordPass, NVIDIA, ProteanTecs, Qualcomm, Recogni, SiFive, SIG, SiTime, Supermicro, Synopsys, Tenstorrent, Third Bridge, TSMC, Untether AI, Ventana Micro.

Comments

Want to join the conversation?

Loading comments...