Generative AI in the Real World: Phillip Carter on Where Generative AI Meets Observability

O’Reilly Media
O’Reilly MediaJun 11, 2026

Why It Matters

Effective observability is essential for reliable, scalable generative AI products; without it, businesses face hidden failures, poor user experiences, and costly downtime.

Key Takeaways

  • Generative AI expands observability challenges beyond traditional system monitoring.
  • LLM APIs introduce high‑cardinality, multi‑dimensional telemetry data streams.
  • Existing tools struggle with AI-specific metrics, prompting new observability solutions.
  • Teams must acquire new skills to evaluate model outputs versus prompts.
  • Early‑stage AI observability market lacks integration, driving fragmented toolchains.

Summary

The podcast features Philip Carter, a Salesforce product manager, discussing how generative AI reshapes observability. He begins with a concise definition: observability is the practice of collecting telemetry to understand complex, distributed systems that cannot be debugged step‑by‑step on a local machine. With the rise of large language model (LLM) APIs, this discipline now faces unprecedented data volume, high‑cardinality signals, and multi‑modal inputs.

Carter highlights several challenges. Traditional monitoring tools focus on Kubernetes metrics, logs, and traces, but AI‑driven workloads generate streams of prompts, model responses, and chained calls that are difficult to aggregate. Evaluating whether a model’s output meets business intent requires new metrics, lab‑to‑production comparisons, and systematic hypothesis testing around prompts, data quality, and model selection. Existing observability platforms often choke on the dimensionality of AI telemetry, prompting a wave of specialized AI‑observability solutions that, however, lack seamless integration with legacy stacks.

He cites concrete examples: Google’s pre‑Gemini natural‑language answer boxes, Honeycomb’s data‑query engine, and production agents that loop LLM calls. In each case, engineers must capture input signals, model parameters, and output quality to diagnose failures—whether a bad answer stems from the model, the prompt, or downstream system components. Carter notes that large tech firms have built internal tools for these problems, but the broader market is still nascent.

The implication is clear: enterprises adopting generative AI must invest in new observability practices, upskill engineering teams, and adopt or build tools that can handle high‑cardinality, AI‑specific telemetry. Failure to do so risks degraded user experiences, hidden bugs, and escalating operational costs, while creating a sizable opportunity for vendors that can bridge AI observability with existing monitoring ecosystems.

Original Description

Phillip Carter, formerly of Honeycomb, joins Ben to talk about observability and AI—what observability means, how generative AI causes problems for observability, and how generative AI can be used as a tool to help SREs analyze telemetry data. There’s tremendous potential because AI is great at finding patterns in massive datasets, but it’s still a work in progress.
Follow O'Reilly on:

Comments

Want to join the conversation?

Loading comments...