Build faster, cheaper, and with lower latency using prompt caching. This Build Hour breaks down how prompt caching works and how to design your prompts to maximize cache hits. Learn what’s actually being cached, when caching applies, and how small changes in your prompts can have a big impact on cost and performance.
Erika Kettleson (Solutions Engineer) covers:
• What prompt caching is and why it matters for real-world apps
• How cache hits work (prefixes, token thresholds, and continuity)
• Best practices like using the Responses API and prompt_cache_key
• How to measure cache hit rate, latency, and token savings
• Customer Spotlight: Warp (ttps://www.warp.dev/) led by Suraj Gupta (Team Lead) to explain the impact of prompt caching
00:00 Introduction
02:37 Foundations, Mechanics, API Walkthrough
12:11 Demo: Batch Image Processing
16:55 Demo: Branching Chat
26:02 Demo: Long Running Compaction
32:39 Cache Discount Pricing Overview
36:03 Customer Spotlight: Warp
49:37 Q&A
Comments
Want to join the conversation?
Loading comments...