SREcon26 Americas - Intelligent Load Balancing in Kubernetes

USENIX Association
USENIX AssociationMay 7, 2026

Why It Matters

By eliminating request skew at scale, Databricks reduces error rates, improves latency, and saves compute resources, a critical advantage for any multi‑cloud, high‑throughput SaaS provider.

Key Takeaways

  • Kubernetes balances connections, not requests, causing request skew.
  • gRPC over HTTP/2 multiplexing amplifies pod traffic imbalance.
  • Resetting connections, headless services, and service mesh proved insufficient.
  • Custom intelligent load balancer built for fairness, efficiency, and zone awareness.
  • Push‑based client‑side routing eliminates extra hops and improves latency.

Summary

The SREcon26 talk details Databricks’ effort to solve request‑imbalance issues in its Kubernetes‑based services by moving from the platform’s default load‑balancing to a custom, intelligent solution.

Databricks discovered that Kubernetes distributes connections uniformly, not individual requests. Because their traffic relies heavily on gRPC/HTTP‑2, a single connection can carry thousands of requests, leading to 4‑5× traffic skew across pods and causing 5xx errors and P99 latency spikes.

Initial mitigations—periodic connection resets, headless services for client‑side DNS load‑balancing, and a full service‑mesh—either added CPU overhead, suffered DNS‑caching limits, or introduced prohibitive proxy hops. A quoted observation summed it up: “all pods are equal, but some pods are more equal than others.”

The team ultimately built a push‑based, client‑library‑driven balancer that incorporates pod load, health, and zone metadata, delivering uniform request distribution without extra network hops. This architecture enables Databricks to run 1,500+ clusters across three clouds while maintaining low latency and cost efficiency.

Original Description

Intelligent Load Balancing in Kubernetes
Gaurav Nanda and Vincent Cheng, Databricks
Kubernetes relies on kube-proxy and DNS for simple Layer 4 load balancing, which works for short-lived HTTP traffic but fails for persistent connections and high-throughput gRPC workloads. With thousands of requests multiplexed over a single TCP connection, clusters often see uneven load, pod hot-spotting, and rising tail latency.
This talk presents a client-side, control-plane-driven approach that removes kube-proxy and DNS from the data path. A lightweight control plane tracks Service and EndpointSlice updates, while client libraries receive live endpoint changes through xDS and make per-request routing decisions at Layer 7. We show how strategies like Power of Two Choices and zone-affinity routing improve load balance, stabilize tail latency, and reduce resource waste in production.
SREs and platform engineers will learn why default Kubernetes routing breaks down, how to design intelligent client-side load balancing, and what operational challenges emerge when deploying these systems at scale.
View the full SREcon26 Americas program at https://www.usenix.org/conference/srecon26americas/program

Comments

Want to join the conversation?

Loading comments...