
The Sequence Knowledge #854: Return of the King: Unrolling the xLSTM Architecture
xLSTM reintroduces recurrent gating into modern sequence models, combining LSTM’s memory control with block‑wise attention for parallel execution. The architecture delivers up to 30 % lower memory usage and roughly double the training speed of comparable Transformers while matching perplexity on standard language benchmarks. An open‑source implementation with pre‑trained weights is now available, enabling rapid adoption. This hybrid design signals a shift back toward efficient, long‑range modeling after the transformer‑only era.

The Sequence Opinion #852: The Bitter Lessons for Agentic Interfaces: A CLI for EVERYTHING
The post argues that the next evolution of agentic SaaS is a full command‑line interface (CLI) rather than expanding tool‑specific APIs. Large language models already excel at interpreting and generating shell commands, yet current platforms constrain them with rigid JSON...

The Sequence Knowledge #850: The Unexpected Comeback of RNNs
In the mid‑2010s, Recurrent Neural Networks (RNNs) were the go‑to architecture for sequence modeling, prized for their constant‑time, O(1) inference and elegant hidden‑state updates. The 2017 "Attention Is All You Need" paper displaced RNNs with Transformers, whose parallelism and GPU‑friendly...

The Sequence Radar #849: Last Week in AI: OpenAI Ships Agents, xAI Eyes Cursor, DeepSeek and Kimi Advance
OpenAI unveiled Workspace Agents alongside GPT‑5.5, turning ChatGPT into a multi‑modal, enterprise‑level runtime that can orchestrate code, text, images and approvals. xAI struck a partnership with Cursor to embed AI‑driven coding agents directly into developers’ IDEs, while rival models DeepSeek v4...

The Sequence Opinion #848: The Agent’s Hands: CLI or MCP?
The post argues that the pivotal question for agentic AI is not which model to use but what the model can actually touch. It highlights two competing bridges between a language model and the world: the traditional command‑line interface (CLI)...

The Sequence AI of the Week #847: Everything You Need to Know About Claude Opus 4.7
Claude Opus 4.7 arrived with modest benchmark gains—SWE‑bench 87.6%, MCP‑Atlas +14.6 pp—but its real shift lies in the API. The release removes all sampling‑level knobs (temperature, top_p, top_k, thinking.budget_tokens) and forces the sole "adaptive" thinking mode. In their place come semantic controls:...

The Sequence Knowledge #846: Beyond Transformer: A New Series
The blog announces a new series tracking emerging alternatives to the Transformer architecture, which has dominated AI for a decade. It notes that self‑attention became popular because it mapped well onto GPUs and offered a simple token‑wise interaction model. The...

The Sequence Opinion #844: Harness Engineering: The Operating System for Agentic Software
The Sequence Opinion #844 introduces "harness engineering," a discipline that shifts focus from prompting large language models (LLMs) to building robust surrounding systems. By treating models as powerful yet imperfect operators, teams design tools, constraints, observability, and recovery mechanisms that...

The Sequence AI of the Week #843: The AI We Built But Can't Release: A Practical View Into the Claude...
Anthropic released a system card for its unreleased Claude Mythos preview, revealing an AI that outperforms current public models but cannot be launched due to safety and governance concerns. The document details the model’s “cyber‑leap” capabilities, reckless competence, and internal...

The Sequence Knowledge #842: Everything You Need to Know About World Models
The Sequence wraps up its deep‑dive into world models, arguing that the era of large language models is only a prologue to Physical AI. It highlights breakthroughs such as D4RT’s 4D environment reconstruction, World Labs’ Marble 3‑D geometry engine, DeepMind’s...

The Sequence Opinion #840: The Agent-Native Rewrite: Why Every Piece of Software Infrastructure Needs to Be Reimagined for AI Agents
The post argues that today’s software infrastructure was engineered for human users who read screens, interpret exceptions, and issue explicit commands. AI agents, however, operate by interpreting intent, generating code, and acting autonomously, breaking the traditional contract between runtime and...

The Sequence AI of the Week #839: Gemma 4 and the Compression of Intelligence
Gemma 4 marks Google’s shift from frontier‑style AI demos to everyday infrastructure. The model compresses advanced multimodal reasoning, long‑context handling, and agentic behavior into a lightweight runtime that can run on mobile devices and servers alike. Unlike typical chatbots, Gemma 4 is...

The Sequence Knowledge #838: Project GENIE: Building Playable Worlds From Pixels
Project GENIE, Google’s Generative Interactive Environment, moves AI beyond text‑only models toward real‑time video world simulation. By tokenizing raw pixels, the system builds a dynamic internal representation that reacts to user actions, effectively turning a viewer into an active participant....

The Sequence Opinion #836: Insurance for AI Agents ? Not as Crazy as You Think
Software engineering is undergoing a paradigm shift as developers increasingly rely on large language models to write code through natural‑language prompts, a practice dubbed “vibe coding.” By 2026, these models are capable of autonomous, multi‑step research loops, evolving into “vibe...

The Sequence Knowledge #833: How to Build a World Model
The post outlines a practical toolkit for building modern AI world models, emphasizing that a world model is a layered stack rather than a single algorithm. It begins with tokenizing reality—compressing observations before reasoning—and proceeds through techniques like dynamics learning,...

The Sequence Radar #832: Last Week in AI: Compression, Voice, and Why It All Matters
Google Research unveiled TurboQuant, a 3‑bit KV‑cache quantization that cuts memory use by six‑fold and delivers up to eight‑times faster inference on H100 GPUs with no measurable accuracy loss. In the same week Google released Gemini 3.1 Flash Live, a single native audio...

The Sequence AI of the Week #830: The Quiet Ambush: Inside the Amazing MiMo-V2-Pro Aka Hunter Alpha
The blog spotlights MiMo‑V2‑Pro, also known as Hunter Alpha, a stealthy AI agent that appeared as an unnamed OpenRouter API endpoint. Unlike traditional chat‑focused models, MiMo‑V2‑Pro showcases hybrid attention and multi‑token prediction, positioning it for autonomous, tool‑driven workflows. The author...

The Sequence AI of the Week #826: Sleep While It Computes: Inside Karpathy’s AutoResearch
Andrej Karpathy unveiled AutoResearch, an open‑source framework that automates the full machine‑learning research loop—from hypothesis generation to model evaluation—without human intervention. The system continuously runs experiments while researchers sleep, effectively turning GPUs into "sleeping computers" that iterate at machine speed....

The Sequence Knowledge #825: Inside World Labs Marble
World Labs unveiled Marble, a Large World Model that shifts AI focus from temporal pixel prediction to spatial intelligence. Founded by Dr. Fei-Fei Li, the model lifts 2D observations into a 4D representation, enabling reconstruction, generation, and simulation of persistent...

The Sequence Opinion #823: SaaSmagedon, Is SaaS Dead?: Vibe Coding, Agentic Engineering, and the Collapse of the Code Moat
The software sector experienced a dramatic correction in early 2026, wiping out over $1 trillion in market value in a single week. Analysts label this upheaval “SaaSmagedon,” citing the erosion of traditional SaaS fundamentals—per‑seat pricing, human‑centric interfaces, and the protective “code...

The Sequence Opinion #819: How AI Chips Are Made?
The post explains that AI performance in 2026 hinges more on hardware than algorithms, with GPUs—originally built for graphics—serving as the foundation for neural‑network training. It outlines the engineering journey from high‑level RTL and Verilog code through physical design to...

The Sequence AI of the Week #818: You Cannot Miss Qwen 3.5
Alibaba's Qwen team unveiled the Qwen 3.5 series, spanning flagship 397B, medium 35B, and small 0.8B‑9B models optimized for edge devices. The lineup introduces a radical architectural shift, replacing dense transformers with extreme Mixture‑of‑Experts sparsity and native multimodal support. Benchmarks...

The Sequence Knowledge #817: DeepMind Genie and Interactive World Models
The post introduces actionable world models, a new class of AI that can not only generate realistic video but also manipulate and control simulated environments. It highlights DeepMind’s Genie series as a leading example, showcasing models that turn passive video...
