
The AI Hardware Show episode dives deep into the rapidly evolving LLM inference market, profiling a suite of startups that are redefining data‑center acceleration. Hosts Sally Ward Foxton and Ian Cutras outline why inference at scale is the next cash‑flow engine, noting that dozens of unicorns are racing to lock down deterministic performance, power efficiency, and cost advantages. Key insights include Groq’s Language Processing Unit, a 14 nm chip that eliminates caches, DRAM and out‑of‑order execution to guarantee compile‑time latency, and its upcoming 4 nm, stacked‑DRAM successor funded by a $700 million Series D. Etched’s SOHU ASIC, built on TSMC’s 4 nm node, forgoes all flexibility to run transformers exclusively, claiming 500 k Llama 70B tokens per second—an order of magnitude ahead of Nvidia’s Blackwell. Meanwhile, New chips’ Raptor accelerator balances modest 8‑10 tps per chip latency with on‑device vector search, targeting enterprise workloads where power and latency trump raw throughput. Samanova’s SN40L leverages a coarse‑grained reconfigurable array, 520 MB SRAM and 64 GB HPM to serve multi‑trillion‑parameter models with micro‑second model‑switching, sold as a fully integrated rack. Talis bets on a “hard‑core model‑as‑silicon” approach, recompiling each model onto a custom chip for thousand‑fold efficiency gains, while Posetron’s FPGA‑based Atlas card promises 70 % faster token rates than Nvidia Hopper by exploiting HBM‑enabled Altera Agile FPGAs. Notable quotes underscore the stakes: Groq’s acquisition by Nvidia was announced on Christmas Eve 2025, Etched’s CEO admits, “If transformers lose, we lose,” and Talis’s founder emphasizes eliminating every runtime abstraction. Posetron’s founders, former Groq engineers, tout 93 % memory‑bandwidth utilization on DDR‑only ASICs as a path to competitive performance without HBM. These anecdotes illustrate the spectrum from ultra‑flexible CPUs to single‑purpose ASICs, each carving a niche in the inference hierarchy. The implications are clear: investors must choose between flexibility and peak efficiency, while hyperscalers weigh deterministic latency against the risk of architectural lock‑in. As power‑hungry GPUs approach diminishing returns, specialized silicon—whether deterministic LPUs, transformer‑only ASICs, or model‑compiled chips—could reshape AI infrastructure economics, driving down cost per token and enabling new edge‑centric generative applications.

The video chronicles the rise of Very Long Instruction Word (VLIW) architectures, a radical approach that promises computers up to twenty‑plus times faster without exotic silicon. By shifting the burden of parallelism from hardware to a sophisticated compiler, VLIW packs...

Intel unveiled its Nova Lake family, branded Ultra 5 and Ultra 7, positioned as ultra‑low‑cost CPUs aimed at budget‑conscious professionals rather than gamers. Benchmarks show the chips trail AMD’s Ryzen 5/7 equivalents at similar price points, especially in gaming, while offering...

The briefing spotlights how artificial‑intelligence workloads are turning high‑bandwidth memory (HBM) into a critical bottleneck for semiconductor manufacturers. HBM, once a niche component, now underpins the most powerful AI accelerators and is being ordered in volumes that dwarf traditional DRAM...

The video examines the recent reversal in memory pricing, highlighting a 30% decline in DDR5 costs as AI‑driven demand eases and manufacturers adjust inventories. It also teases early Zen 6 benchmark leaks that suggest a substantial performance jump for AMD’s next‑gen...