TechTechPotato (Ian Cutress)

Creator

0 followers

Semiconductor/CPU/GPU architecture analysis and industry commentary

Video•Apr 6, 2026

Groq, Etched, SambaNova, Taalas // The AI Hardware Show S2E4

The AI Hardware Show episode dives deep into the rapidly evolving LLM inference market, profiling a suite of startups that are redefining data‑center acceleration. Hosts Sally Ward Foxton and Ian Cutras outline why inference at scale is the next cash‑flow engine, noting that dozens of unicorns are racing to lock down deterministic performance, power efficiency, and cost advantages. Key insights include Groq’s Language Processing Unit, a 14 nm chip that eliminates caches, DRAM and out‑of‑order execution to guarantee compile‑time latency, and its upcoming 4 nm, stacked‑DRAM successor funded by a $700 million Series D. Etched’s SOHU ASIC, built on TSMC’s 4 nm node, forgoes all flexibility to run transformers exclusively, claiming 500 k Llama 70B tokens per second—an order of magnitude ahead of Nvidia’s Blackwell. Meanwhile, New chips’ Raptor accelerator balances modest 8‑10 tps per chip latency with on‑device vector search, targeting enterprise workloads where power and latency trump raw throughput. Samanova’s SN40L leverages a coarse‑grained reconfigurable array, 520 MB SRAM and 64 GB HPM to serve multi‑trillion‑parameter models with micro‑second model‑switching, sold as a fully integrated rack. Talis bets on a “hard‑core model‑as‑silicon” approach, recompiling each model onto a custom chip for thousand‑fold efficiency gains, while Posetron’s FPGA‑based Atlas card promises 70 % faster token rates than Nvidia Hopper by exploiting HBM‑enabled Altera Agile FPGAs. Notable quotes underscore the stakes: Groq’s acquisition by Nvidia was announced on Christmas Eve 2025, Etched’s CEO admits, “If transformers lose, we lose,” and Talis’s founder emphasizes eliminating every runtime abstraction. Posetron’s founders, former Groq engineers, tout 93 % memory‑bandwidth utilization on DDR‑only ASICs as a path to competitive performance without HBM. These anecdotes illustrate the spectrum from ultra‑flexible CPUs to single‑purpose ASICs, each carving a niche in the inference hierarchy. The implications are clear: investors must choose between flexibility and peak efficiency, while hyperscalers weigh deterministic latency against the risk of architectural lock‑in. As power‑hungry GPUs approach diminishing returns, specialized silicon—whether deterministic LPUs, transformer‑only ASICs, or model‑compiled chips—could reshape AI infrastructure economics, driving down cost per token and enabling new edge‑centric generative applications.

By TechTechPotato (Ian Cutress)

TechTechPotato (Ian Cutress)

Groq, Etched, SambaNova, Taalas // The AI Hardware Show S2E4

Technology Pulse

Top Publishers

Top Creators

Top Companies

Top Investors

TechTechPotato (Ian Cutress)

Groq, Etched, SambaNova, Taalas // The AI Hardware Show S2E4