Subquadratic Launches SubQ, a 12M-Token AI Model for Long-Context Tasks

•May 6, 2026

eWeek•May 6, 2026

Why It Matters

By eliminating the need for chunking, retrieval‑augmented pipelines and complex agent orchestration, SubQ could dramatically lower the cost and latency of long‑context AI applications, opening new possibilities for enterprise document analysis, codebase understanding, and other data‑intensive use cases.

Summary

Subquadratic emerged from stealth with a $25 million seed round and unveiled SubQ, the first commercial LLM built on a Subquadratic Selective Attention (SSA) architecture that scales linearly and offers a native 12 million‑token context window at roughly one‑fifth the cost of leading models. The model runs 52× faster than FlashAttention at 1 million tokens, achieved 97% accuracy on the long‑context RULER benchmark for $8 per run (versus about $2,600 for frontier models), and outperformed Opus, GPT‑5.4 and Gemini on multi‑needle retrieval tasks. SubQ is already available via a 12 million‑token API and a CLI tool that can ingest an entire code repository in a single pass, with plans to expand to a 100 million‑token window by Q4.

Subquadratic Launches SubQ, a 12M-Token AI Model for Long-Context Tasks

Why It Matters

Summary

Ask Pulse AI:

Comments

AI Pulse