LLMs on Kubernetes Part 1: Understanding the Threat Model

•March 30, 2026

CNCF Blog•Mar 30, 2026

Why It Matters

Self‑hosted LLMs expose enterprises to novel attack vectors that can leak secrets, bypass controls, or cause unauthorized actions, making robust AI‑specific security essential for compliance and operational safety.

Key Takeaways

•Prompt injection lets users alter model behavior.
•LLMs can unintentionally expose secrets in responses.
•Unverified model downloads risk hidden backdoors.
•Excessive agency grants models dangerous tool access.
•Policy layer needed separate from inference runtime.

Pulse Analysis

The rapid adoption of large‑language models has pushed many organizations to host inference workloads inside their own Kubernetes clusters. Containers give operators familiar scheduling, scaling, and isolation, yet they hide a crucial difference: an LLM processes untrusted natural‑language prompts and can produce arbitrary output. Kubernetes guarantees that a pod runs, but it cannot judge whether a user’s request should be allowed or whether the generated text leaks confidential data. This mismatch creates a new attack surface that traditional cluster‑level policies simply do not cover.

To make sense of these threats, the OWASP Top 10 for LLM applications provides a useful checklist. Prompt injection (LLM01) is the LLM analogue of SQL injection, letting attackers steer model behavior through crafted text. Sensitive information disclosure (LLM02) can surface API keys or internal configurations that the model has memorized. Supply‑chain weaknesses (LLM03) arise when models are pulled from unverified registries, mirroring container image tampering. Finally, excessive agency (LLM06) grants the model the ability to invoke tools or APIs, raising privilege‑escalation concerns. None of these risks are mitigated by pod security policies alone.

The practical solution is to insert an AI‑aware policy layer in front of the inference service, much like an API gateway but with prompt‑validation and output‑filtering capabilities. Open‑source projects such as LiteLLM, Kong AI Gateway, Portkey, and kgateway already expose a unified endpoint, enforce rate limits, perform content moderation, and allow model allow‑lists. By decoupling policy enforcement from the runtime, operators retain Kubernetes’ orchestration benefits while adding defense‑in‑depth for LLM‑specific attacks. As enterprises continue to embed generative AI into internal tools, adopting a dedicated gateway becomes a prerequisite for compliance, cost control, and secure AI operations.

LLMs on Kubernetes Part 1: Understanding the Threat Model

Why It Matters

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Comments

AI Pulse