OpenAI

Company-Unified Profile

20 followers

Official channel showcasing OpenAI research models and demos

Video•Feb 18, 2026

Build Hour: Prompt Caching

The Build Hour session introduced OpenAI’s prompt caching feature, a mechanism that reuses computation for repeated prompt prefixes to cut latency and reduce API costs. Erica explained that once a request exceeds 1,024 tokens, OpenAI begins caching 128‑token blocks, automatically handling text, image, and audio inputs without code changes. Developers can extend cache lifetimes to 24 hours and influence routing with an optional prompt cache key, ensuring similar requests land on the same engine for higher hit rates. Key data points highlighted include a 50‑90% discount on cached tokens across model families and up to a 99% discount for speech‑to‑speech caching. In a benchmark of 2,300 prompts ranging from 1,024 to 200,000 tokens, cached requests showed a 67% faster time‑to‑first‑token for the longest inputs, while short prompts saw modest latency gains. A live demo with an AI styling assistant demonstrated cost reductions from $0.35 to $0.21 per batch when leveraging implicit caching and a prompt cache key, while latency remained comparable for 2,000‑token prompts. The session also covered the technical underpinnings: OpenAI hashes the first 256 tokens and checks for matching 128‑token chunks, reusing attention matrix outputs (floating‑point numbers) rather than recomputing them. Developers are advised to keep prompt prefixes deterministic—avoiding timestamps or stray whitespace—and to employ context engineering, truncation, summarization, and appropriate endpoint selection to maximize cache hits. For businesses, adopting prompt caching can translate into substantial cost savings at scale and more predictable response times for heavy‑weight workloads, especially in multimodal applications like image batch processing or long conversational threads. By structuring prompts for cacheability and using the prompt cache key, teams can achieve higher throughput without sacrificing model intelligence.

Technology Pulse

OpenAI

Build Hour: Prompt Caching

Updates to Deep Research in ChatGPT

How PMs Use the Codex App

OpenAI Super Bowl 2026 | Codex | You Can Just Build Things

Multitasking with the Codex App

Growing a Family Tamale Shop | with ChatGPT

Modernizing an 86-Year-Old Salvage Yard | with ChatGPT

Generations of Farming | with ChatGPT

How Designers Prototype Using the Codex App

Automate Tasks with the Codex App

Introducing the Codex App

Accelerating Science with Prism

Introducing Prism, a Free Workspace for Scientists to Write and Collaborate on Research

OpenAI Town Hall with Sam Altman

Build Hour: Apps in ChatGPT

Technology Pulse

OpenAI

Build Hour: Prompt Caching

Updates to Deep Research in ChatGPT

How PMs Use the Codex App

OpenAI Super Bowl 2026 | Codex | You Can Just Build Things

Multitasking with the Codex App

Growing a Family Tamale Shop | with ChatGPT

Modernizing an 86-Year-Old Salvage Yard | with ChatGPT

Generations of Farming | with ChatGPT

How Designers Prototype Using the Codex App

Automate Tasks with the Codex App

Introducing the Codex App

Accelerating Science with Prism

Introducing Prism, a Free Workspace for Scientists to Write and Collaborate on Research

OpenAI Town Hall with Sam Altman

Build Hour: Apps in ChatGPT