OpenAI Develops Six-Layer Context System to Help Employees Navigate 600 Petabytes of Data

•January 30, 2026

THE DECODER•Jan 30, 2026

Companies Mentioned

OpenAI

Slack

WORK

Google

GOOG

Notion

Why It Matters

The solution slashes data‑analysis time from days to minutes, boosting productivity for engineers and analysts. It also demonstrates a scalable approach for enterprises grappling with sprawling, undocumented data assets.

Key Takeaways

•OpenAI's data agent handles 600 PB across 70k tables.
•Codex Enrichment extracts table logic directly from code.
•Six context layers combine metadata, annotations, and live queries.
•Memory layer reduces query time from 22 minutes to 1:22.
•System adapts automatically when source code changes.

Pulse Analysis

Enterprises today face an unprecedented data explosion, with petabytes of information stored in countless tables that often lack clear documentation. Traditional data discovery relies on schema metadata and manual annotations, which can miss subtle transformations applied during table generation. OpenAI’s internal data agent tackles this problem by marrying natural‑language interfaces with deep code analysis, allowing non‑technical staff to retrieve precise insights without combing through endless data catalogs.

At the heart of the system is Codex Enrichment, a technique that parses the company’s codebase to derive the true semantics of each dataset. By understanding how tables are built—filters, joins, and business logic—the agent creates a richer definition than what metadata alone can provide. This enrichment sits alongside five other context layers: raw schema, expert‑curated descriptions, institutional knowledge harvested from Slack and Docs, a learning memory that captures prior corrections, and live query capability for real‑time data pulls. The layered architecture ensures that the agent can disambiguate tables with similar structures but different business meanings, dramatically reducing the time needed for accurate analysis.

The broader impact extends beyond OpenAI. As more organizations accumulate massive, loosely governed data lakes, the need for AI‑driven data assistants that can interpret code and institutional context will grow. By cutting query latency from days to minutes, such agents can accelerate product development cycles, improve decision‑making speed, and lower the operational cost of data engineering. OpenAI’s approach offers a blueprint for building scalable, self‑updating data discovery tools that keep pace with evolving codebases, positioning AI as a critical layer in modern data governance.