Google DeepMind Researchers Release Gemma Scope 2 as a Full Stack Interpretability Suite for Gemma 3 Models
AI

Google DeepMind Researchers Release Gemma Scope 2 as a Full Stack Interpretability Suite for Gemma 3 Models

MarkTechPost
MarkTechPostDec 23, 2025

Why It Matters

By exposing the internal mechanics of large language models, Gemma Scope 2 gives AI‑safety teams a practical microscope to diagnose and mitigate risky behaviors, accelerating trustworthy AI deployment.

Google DeepMind Researchers Release Gemma Scope 2 as a Full Stack Interpretability Suite for Gemma 3 Models

Google DeepMind Researchers introduce Gemma Scope 2, an open suite of interpretability tools that exposes how Gemma 3 language models process and represent information across all layers, from 270 M to 27 B parameters.

Its core goal is simple: give AI‑safety and alignment teams a practical way to trace model behavior back to internal features instead of relying only on input‑output analysis. When a Gemma 3 model jailbreaks, hallucinates, or shows sycophantic behavior, Gemma Scope 2 lets researchers inspect which internal features fired and how those activations flowed through the network.

Image 1: Qualifire's Rogue feature is an open source dynamic evaluation tool for agentic systems, which utilizes OpenAI V4+, JS OpenAI, and Python OpenAI

What is Gemma Scope 2?

Gemma Scope 2 is a comprehensive, open suite of sparse autoencoders and related tools trained on internal activations of the Gemma 3 model family. Sparse autoencoders (SAEs) act as a microscope on the model. They decompose high‑dimensional activations into a sparse set of human‑inspectable features that correspond to concepts or behaviors.

Training Gemma Scope 2 required storing around 110 petabytes of activation data and fitting over 1 trillion total parameters across all interpretability models.

The suite targets every Gemma 3 variant, including 270 M, 1 B, 4 B, 12 B and 27 B‑parameter models, and covers the full depth of the network. This is important because many safety‑relevant behaviors only appear at larger scales.

Image 2: ML Global Impact Report with Ensemble Methods is at 26 %, Classical ML at 47 %, Clustering at 4 %, Neural Networks at 36 %, …

What is new compared to the original Gemma Scope?

The first Gemma Scope release focused on Gemma 2 and already enabled research on model hallucination, identifying secrets known by a model, and training safer models.

Gemma Scope 2 extends that work in four main ways:

  1. Broader coverage – The tools now span the entire Gemma 3 family up to 27 B parameters, which is needed to study emergent behaviors observed only in larger models (e.g., the behavior previously analyzed in the 27 B‑size C2S Scale model for scientific discovery tasks).

  2. Layer‑wise SAEs and transcoders – SAEs and transcoders are trained on every layer of Gemma 3. Skip transcoders and cross‑layer transcoders help trace multi‑step computations that are distributed across layers.

  3. Matryoshka training – The suite applies the Matryoshka training technique so that SAEs learn more useful and stable features, mitigating some flaws identified in the earlier Gemma Scope release.

  4. Chat‑tuned interpretability – Dedicated tools for Gemma 3 models tuned for chat make it possible to analyze multi‑step behaviors such as jailbreaks, refusal mechanisms, and chain‑of‑thought faithfulness.

Key Takeaways

  1. Gemma Scope 2 is an open interpretability suite for all Gemma 3 models, from 270 M to 27 B parameters, with SAEs and transcoders on every layer of both pretrained and instruction‑tuned variants.

  2. The suite uses sparse autoencoders as a microscope that decomposes internal activations into sparse, concept‑like features, plus transcoders that track how these features propagate across layers.

  3. Gemma Scope 2 is explicitly positioned for AI‑safety work to study jailbreaks, hallucinations, sycophancy, refusal mechanisms, and discrepancies between internal state and communicated reasoning in Gemma 3.


Michal Sutter

Michal Sutter is a data‑science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

Comments

Want to join the conversation?

Loading comments...