
The release cuts operational friction for AI teams, enabling faster scaling and more reliable production pipelines. By embedding AI‑specific primitives into the core platform, Kubernetes solidifies its role as the shared operating system for mixed workloads.
Kubernetes has long been the de‑facto substrate for cloud‑native workloads, but AI/ML workloads stress the scheduler, resource model, and configuration pipelines in unique ways. Version 1.35 addresses these pressures by adding workload‑aware scheduling, an alpha feature that lets platform teams declare groups of Pods that must be placed together. Coupled with an initial gang‑scheduling implementation, this reduces fragmented placement of distributed training jobs, freeing capacity and cutting the time‑to‑completion for large‑scale model training.
The promotion of in‑place pod resource resizing to stable status is another game‑changer for inference services that require rapid tuning. Operators can now adjust CPU or memory limits without triggering container restarts, preserving stateful connections and lowering latency spikes during load spikes. For long‑running batch jobs, the ability to resize on the fly improves cluster utilization and reduces the operational overhead of rolling updates. Together with the continued support for Dynamic Resource Allocation, teams gain a more predictable path to orchestrating GPUs and other accelerators, a critical factor as model sizes and training demands grow.
Beyond the core scheduling and scaling improvements, 1.35 tightens the "last mile" of configuration management by making KYAML the default output format for kubectl. This stricter YAML subset eliminates ambiguous constructs that often cause CI failures or drift between environments. At the same time, the announced retirement of Ingress NGINX by March 2026 forces platform engineers to evaluate alternative ingress controllers, aligning with broader security and support strategies. For organizations treating Kubernetes as a unified AI operating system, these enhancements reduce bespoke tooling, reinforce governance, and accelerate the path from experiment to production.
Comments
Want to join the conversation?
Loading comments...