Accel

Investor-Unified Profile

0 followers

Accel’s founder/VC conversations on company‑building, seed‑to‑growth fundraising, and market entry.

Video•May 26, 2026

FlashAttention-4 by Ted Zadouri X GPU MODE

The lecture introduced FlashAttention‑4, a co‑designed attention kernel targeting NVIDIA’s Blackwell B200 architecture. By re‑examining the forward and backward passes, the authors aligned algorithmic structure with Blackwell’s new tensor‑memory and fifth‑generation tensor‑core capabilities, addressing the incompatibilities of earlier FlashAttention versions. Key technical insights include exploiting the 2‑CTA MMA mode, which expands the M dimension to 256 and halves shared‑memory traffic, and leveraging tensor memory to alleviate register pressure. The forward pass now runs in a ping‑pong fashion, assigning two query tiles per CTA to overlap matrix‑multiply‑accumulate (MMA) work with softmax computation. A hybrid software‑emulated exponential—75 % hardware SFU, 25 % polynomial approximation—removes the exponential unit bottleneck, while a conditional scaling trick in the online softmax reduces non‑math operations. The presenters highlighted concrete performance numbers: the redesigned kernel matches softmax latency with MMA execution, and the software emulator executes in roughly nine SAS instructions, dramatically faster than a full SFU call. The 2‑CTA approach cuts shared‑memory footprint and SM bandwidth, and the slack‑based scaling maintains determinism with minimal numeric error, typically using a factor of eight (2⁸≈256) for BF16 stability. These innovations translate into up to 2‑3× speedups for large‑scale transformer workloads on Blackwell GPUs, lowering training costs and enabling higher token throughput. The broader lesson—mapping under‑utilized GPU functional units to algorithmic steps—offers a template for future kernel optimizations across emerging GPU architectures.

By Accel

Video•May 8, 2026

BONUS: Nebius’s Arkady Volozh | On Building in the Agentic Era | Spotlight On | AccelVC

In this episode of Spotlight On, Arkady Volozh of Nebius outlines the company’s rapid expansion in AI infrastructure, highlighted by massive contracts with Meta and Microsoft, a deepening engineering partnership with Nvidia, and a multi‑billion‑dollar financing strategy. Volozh explains how...

By Accel

Accel

FlashAttention-4 by Ted Zadouri X GPU MODE

BONUS: Nebius’s Arkady Volozh | On Building in the Agentic Era | Spotlight On | AccelVC

Technology Pulse