Episode 15 - Inside the Model Spec

OpenAI
OpenAIMar 25, 2026

Why It Matters

A transparent, iteratively refined model spec ensures AI behavior aligns with safety, user expectations, and regulatory standards, fostering trust as models become increasingly powerful.

Key Takeaways

  • Model spec defines high‑level behavior goals for OpenAI models
  • Spec is a public, open‑source document for transparency and feedback
  • Chain of command hierarchy resolves conflicts between policies, developers, users
  • Deliberative alignment translates spec language into training signals
  • Continuous iteration aligns model output with evolving spec expectations

Summary

The episode introduces OpenAI’s model specification—a comprehensive, publicly available guide that outlines how its AI systems should behave. Jason Wolf explains that the spec is not a strict implementation rulebook but a high‑level description of intended behavior for employees, developers, policymakers, and end‑users. It captures core decisions, such as safety priorities, tone, and steerability, while acknowledging that many product features (memory, usage‑policy enforcement) lie outside its scope.

Key insights include the spec’s structure: a 100‑page document beginning with OpenAI’s mission, followed by detailed policies, examples, and authority levels. Policies are organized by a “chain of command” that ranks OpenAI instructions above developer instructions, which in turn outrank user instructions, preserving safety while allowing user steerability. The spec is continuously refined through public feedback via the model‑spec.openai.com site, GitHub forks, and in‑product reporting, with changes feeding back into training processes like deliberative alignment.

Wolf shares concrete examples, such as handling a child’s question about Santa Claus. The spec advises a cautious, vague response to protect the child’s imagination while maintaining honesty, illustrating the nuanced trade‑offs between honesty, safety, and user context. He also recounts the spec’s origin—stemming from a desire to replace opaque reinforcement‑learning‑from‑human‑feedback data with a clear, handbook‑style guide that can evolve as models become more capable.

The implications are significant: developers now have a transparent framework to anticipate model behavior, regulators gain insight into OpenAI’s safety commitments, and users benefit from more predictable, ethically aligned interactions. As AI systems grow in capability, the model spec serves as a living contract that balances innovation with societal safeguards.

Original Description

The more AI can do, the more we need to ask what it should and shouldn’t do. In this episode, OpenAI researcher Jason Wolfe joins host Andrew Mayne to talk about the Model Spec, the public framework that defines intended model behavior. They discuss how the Model Spec works in practice, including how the chain of command handles conflicts between instructions, and how OpenAI evolves it based on feedback, real-world use, and new model capabilities.
More on our approach to the Model Spec: https://openai.com/index/our-approach-to-the-model-spec/
Chapters
00:00 Introduction
01:10 What is the Model Spec?
03:55 How does the Model Spec work in practice?
06:26 Transparency: Where to read the Model Spec & give feedback
07:51 How did the Model Spec originate?
10:02 How does the spec translate into model behavior?
11:26 What is the hierarchy / chain of command?
13:35 Handling edge cases like Santa Claus
17:41 How does the Model Spec evolve over time?
19:59 What happens when models disagree with the spec?
22:05 How do smaller models follow the spec?
23:16 Is chain-of-thought useful for alignment?
24:16 Model Spec vs Anthropic’s Constitution
26:28 What surprised you most?
26:56 How do you define the scope of the spec?
27:44 What is the future of the Model Spec?
31:16 How should developers think about the spec?
34:44 Asimov’s laws vs Model Spec
37:16 Could AI write a Human Spec?

Comments

Want to join the conversation?

Loading comments...