SREcon26 Americas - Intelligent Load Balancing in Kubernetes

USENIX Association
USENIX AssociationApr 23, 2026

Why It Matters

Eliminating kube‑proxy from the data path unlocks consistent performance for modern microservices, especially gRPC‑heavy workloads, giving SREs a scalable tool to curb latency and cost.

Key Takeaways

  • Kube‑proxy/DNS causes uneven load for persistent gRPC streams
  • Client‑side xDS feeds live EndpointSlice data to applications
  • Power of Two Choices algorithm balances traffic across pods
  • Zone‑affinity routing reduces cross‑zone latency and bandwidth

Pulse Analysis

Kubernetes’ built‑in load‑balancing relies on kube‑proxy and DNS, which work well for short HTTP requests but falter when services maintain long‑lived connections or multiplex thousands of gRPC calls over a single TCP stream. In such scenarios, traffic tends to gravitate toward a subset of pods, creating hot‑spots, inflating tail latency, and wasting compute resources. This limitation has become a pain point for enterprises that run data‑intensive workloads, where even modest latency spikes can translate into measurable revenue loss.

The Databricks team’s solution shifts the routing logic to the client side, driven by a lightweight control plane that monitors Service and EndpointSlice changes. By streaming these updates through the xDS API, client libraries gain real‑time visibility of healthy endpoints and can make per‑request decisions at Layer 7. Techniques like the Power of Two Choices—randomly probing two pods and selecting the less loaded—combined with zone‑affinity routing, ensure traffic spreads evenly while keeping traffic within the same availability zone whenever possible. Early production deployments report stabilized tail latency and up to 30% reduction in CPU waste compared with the default kube‑proxy path.

For SREs and platform engineers, this paradigm shift introduces new operational considerations: managing the control‑plane lifecycle, ensuring xDS compatibility across language SDKs, and monitoring client‑side routing metrics. However, the payoff is a more resilient service mesh that adapts instantly to topology changes without requiring external proxies. As microservice architectures continue to adopt high‑throughput protocols like gRPC, client‑driven intelligent load balancing is poised to become a best‑practice component of modern Kubernetes platforms.

Original Description

Intelligent Load Balancing in Kubernetes
Gaurav Nanda and Vincent Cheng, Databricks
Kubernetes relies on kube-proxy and DNS for simple Layer 4 load balancing, which works for short-lived HTTP traffic but fails for persistent connections and high-throughput gRPC workloads. With thousands of requests multiplexed over a single TCP connection, clusters often see uneven load, pod hot-spotting, and rising tail latency.
This talk presents a client-side, control-plane-driven approach that removes kube-proxy and DNS from the data path. A lightweight control plane tracks Service and EndpointSlice updates, while client libraries receive live endpoint changes through xDS and make per-request routing decisions at Layer 7. We show how strategies like Power of Two Choices and zone-affinity routing improve load balance, stabilize tail latency, and reduce resource waste in production.
SREs and platform engineers will learn why default Kubernetes routing breaks down, how to design intelligent client-side load balancing, and what operational challenges emerge when deploying these systems at scale.
View the full SREcon26 Americas program at https://www.usenix.org/conference/srecon26americas/program

Comments

Want to join the conversation?

Loading comments...