Tenstorrent Accelerates Big MOE Models with V3/V4
This is cool. @tenstorrent is digging in. Big models are a passion project for us. May 1st we announce big numbers on V3, V4 results after that. Networked AI is a natural fit for big MOE models
Intel's RISCified X86-64 Beats AI Accelerators
Intel won by RISCifying their CISC - still amazed how well x86-64 worked out LPUs is a subset of AI, an accelerator, it's not the RISC @tenstorrent , Trainium, Google TPU are closer. Clean Tensor processor is step one. Then generality, memory...
TT‑Lang: Python DSL for Tenstorrent’s High‑performance Kernels
TT-Lang from Tenstorrent from Groq? It’s a Python-based DSL that lets you write high-performance custom kernels and fused ops directly on Tensix cores, Blackhole, etc.. Think “Triton but made for Tenstorrent hardware.” V 1.0 next week
Networked AI Shows Small Cables Power Big Systems
The Network always wins. (in big systems) Hook up small things with cheap cables. Distributed software is hard, needs to really work. Very cool how this is working out @tenstorrent Networked AI
Each New Accelerator Builds on the Previous
The accelerator accelerator accelerator CPUs are the OG GPUs accelerate CPUs TPUs accelerate GPUs (Tensor cores) LPUs accelerate TPUs Maybe something simpler could work
Accelerator Mismatch Turns Datacenters Into
Historically this doesn’t work In chips we call it dead silicon Predicting dead datacenter Ratioing accelerators to the main computer is fragile Models change, size changes and you have a brick Possibly $$$