AI Agent Debugging Setup: OpenTelemetry + Jaeger in Kubernetes

•February 6, 2026

0

DevOps Toolkit Series (Viktor Farcic)

DevOps Toolkit Series (Viktor Farcic)•Feb 6, 2026

Why It Matters

Unified OpenTelemetry tracing turns opaque LLM‑driven agents into debuggable services, improving reliability, cost control, and trust for enterprises deploying AI at scale.

Key Takeaways

•OpenTelemetry tracing reveals hidden AI agent execution paths
•Agent requests can vary dramatically in spans and latency
•GenAI semantic conventions standardize model, token, and tool metadata
•Unified traces integrate AI, API, database, and message flows
•Vendor‑neutral OpenTelemetry outperforms siloed AI observability tools in production

Summary

The video demonstrates how OpenTelemetry combined with Jaeger can provide end‑to‑end visibility into AI agents running in Kubernetes, turning what appears to be a black‑box LLM interaction into an observable distributed trace. By instrumenting the agent, its prompts, tool calls, and downstream service requests, developers can capture each operation as a span and stitch them together into a single trace that mirrors traditional microservice debugging.

Because generative agents decide their own execution path, two identical user requests can produce wildly different latency and span counts—as shown by a 10‑second, 10‑span request versus a 60‑second, 42‑span request. OpenTelemetry’s vendor‑neutral tracing model, enhanced with the new gen_ai semantic conventions, records model identifiers, token usage, and tool invocations, enabling systematic analysis of these unpredictable flows.

The presenter highlights a live demo where the agent queries Kubernetes resources, exposing every API call, vector search, and embedding operation as individual spans. He contrasts this approach with specialized AI observability platforms like LangSmith and LangFuse, noting that while they excel at prompt‑level debugging, they silo data away from existing infrastructure traces.

Adopting OpenTelemetry unifies AI agent telemetry with existing API, database, and message‑queue traces, giving operators a single pane of glass for performance, cost, and reliability insights. This standardization reduces vendor lock‑in, simplifies root‑cause analysis, and builds trust in AI‑driven workflows across production environments.

Original Description

This video tackles a critical challenge in AI agent development: understanding what's actually happening when agents behave unpredictably. Using a real-world example where two identical requests to the same AI agent produced dramatically different results—one taking 10 seconds with 10 operations, the other over a minute with 42 operations—the video demonstrates why observability is essential for agentic systems. Since LLMs decide their own execution paths, choosing which tools to call and how many times to loop, traditional debugging approaches fall short.

The solution presented is OpenTelemetry tracing, the same technology used to monitor distributed systems, now applied to AI agents. The video explains how OTel's standardized `gen_ai.*` semantic conventions capture AI-specific telemetry across any model provider, while integrating seamlessly with existing infrastructure observability. Through a hands-on demonstration using Jaeger and a Kubernetes-based AI agent, viewers learn how to trace the complete journey of requests through agents, LLMs, tool executions, and external services. The key takeaway: while AI-specific tools like LangSmith have their place for prompt debugging and evaluations, OpenTelemetry provides a unified, vendor-neutral standard that connects AI traces to everything else in your stack—giving you the visibility needed to debug, optimize, and trust your agentic systems.

#OpenTelemetry #AIAgents #Observability

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬

➡ Transcript and commands: https://devopstoolkit.live/observability/ai-agent-debugging-setup-opentelemetry-jaeger-in-kubernetes

🔗 OpenTelemetry: https://opentelemetry.io

🎬 Distributed Tracing Explained: OpenTelemetry & Jaeger Tutorial: https://youtu.be/Oa-zqv-EBpw

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬

If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬

➡ BlueSky: https://vfarcic.bsky.social

➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬

🎤 Podcast: https://www.devopsparadox.com/

💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬

00:00 Tracing for AI Agents

01:04 Why AI Agents Need Tracing

09:29 OTel vs AI-Specific Tools

0

Comments

Want to join the conversation?

Loading comments...