
Rafay Achieves CNCF Kubernetes AI Conformance for v1.35 | Rafay
Rafay's Managed Kubernetes Service (MKS) has earned CNCF Kubernetes AI Conformance for version 1.35, the newest industry standard for running AI/ML workloads on Kubernetes. The certification proves MKS meets mandatory requirements across accelerator management, gang scheduling, GPU‑aware autoscaling, deep observability and security isolation. The AI Conformance program, launched in late 2025, has grown to 31 certified platforms—a 70% increase—signaling rapid adoption. Rafay’s achievement positions it as an AI‑ready, portable platform for enterprises building GPU‑intensive applications.

Automated GPU Health Monitoring with NVIDIA NVSentinel on the Rafay Platform
GPU clusters cost tens of thousands of dollars per unit, and hardware faults can halt AI training and inference, jeopardizing service‑level agreements. NVIDIA’s open‑source NVSentinel adds continuous health monitoring, automatic quarantine, and self‑healing to Kubernetes‑managed GPU fleets. Rafay integrates NVSentinel...

AI Factories Will Be Won on Efficiency | Rafay + Kubex Partnership
Enterprises are moving from AI experimentation to building "AI factories"—repeatable, governed platforms that can train, deploy, and operate models at scale. Rafay and Kubex announced a partnership that combines Rafay's Kubernetes‑based AI orchestration with Kubex's autonomous GPU optimization. The joint...

Compute Domains & Multi-Node NVLink in Kubernetes: Scaling GPU Workloads
NVIDIA’s ComputeDomains add a Kubernetes‑native layer that dynamically creates and tears down multi‑node NVLink communication groups for GPU workloads. By extending the Dynamic Resource Allocation driver, the feature makes cross‑node bandwidth a schedulable resource rather than a static configuration. This...

The Telco AI Imperative: From Connectivity to Sovereign AI Infrastructure
Telecommunications operators are poised to shift from pure connectivity to sovereign AI infrastructure by leveraging their distributed data centers, edge fiber, and regulatory trust. The surge in AI training and inference workloads creates demand for GPU‑as‑a‑Service, token‑as‑a‑service, and AI marketplaces...
.png)
Token Factory GA: Monetize AI with Token-Based APIs | Rafay
Rafay Systems has launched Token Factory, a token‑metered API layer for AI models that lets neocloud and AI‑factory operators monetize compute without building their own orchestration stack. The announcement coincides with the industry’s pivot to token‑based commerce, a trend underscored...

Understanding LLM Inference Metrics in Rafay's Token Factory
Rafay’s Token Factory turns GPU clusters into managed LLM inference APIs with built‑in multi‑tenancy, token‑metered billing and auto‑scaling. The platform ships a metrics dashboard that surfaces latency (TTFT, ITL, E2E), throughput and KV‑cache utilization at multiple percentiles, letting operators gauge...
.png)
Flexible GPU Billing Models for AI Clouds: Powering the AI Factory with Rafay
Rafay announced the addition of a reservation‑based billing model to its GPU‑cloud platform, complementing existing on‑demand and monthly recurring charge options. The new feature guarantees customers access to a specified number of GPUs—such as 16 NVIDIA H200 units—for a fixed...

Kubernetes Makes GPUs First-Class: Advances in Allocation, Scheduling, and Isolation
At KubeCon Europe 2026 NVIDIA donated its Dynamic Resource Allocation (DRA) driver, saw the KAI scheduler graduate to a CNCF Sandbox project, and added GPU support to Kata Containers. These moves turn GPUs into first‑class, community‑owned resources in Kubernetes, enabling...

Eliminate SSH Access with Rafay MKS Control Plane Overrides
Rafay has introduced Control Plane Overrides for its Managed Kubernetes Service (MKS), allowing administrators to customize API Server, Controller Manager, and Scheduler settings without SSHing into master nodes. The declarative approach lets users define extra arguments, volumes, and mounts directly...
.png)
OpenClaw on Kubernetes: Designing Always-On AI as a Platform Service Meta Description
OpenClaw is an open‑source, gateway‑centric runtime that turns generative AI into an always‑on service deployed on Kubernetes. It provides a unified onboarding flow for workspaces, channels and skills, and ships with a documented Kubernetes install path and operator. The platform...

A Self-Service GPU Experience That Feels Instant | Rafay
Rafay’s Developer Pods let developers request GPU‑enabled environments through a simple UI, bypassing tickets, YAML, and long wait times. Within roughly 30 seconds, a pod spins up and is reachable via SSH, offering pre‑built images such as Ubuntu and various...
.png)
Developer Pods for Platform Teams: Designing Self-Service GPU Experiences
Rafay’s SKU Studio enables platform teams to package GPU‑powered Kubernetes environments as ready‑to‑use Developer Pods. By defining curated SKUs with clear descriptions, guided inputs and prescriptive outputs, teams turn raw infrastructure into a self‑service product that launches in about 30...

Instant Developer Pods: Rethinking GPU Access for AI Teams | Rafay
Rafay’s Developer Pods redefine GPU access by delivering ready‑to‑use Ubuntu environments with CUDA in roughly 30 seconds, eliminating the multi‑day ticket queues and bulky VM provisioning that plague many enterprises. The solution abstracts Kubernetes away from developers, offering a simple...

Stop Paying for Unused Kubernetes Resources | Optimize Pod Efficiency
Kubernetes platforms often suffer from over‑provisioned pods as developers pad CPU and memory requests to avoid OOM or throttling. Rafay’s App Resizing feature, introduced in the 4.1 release, collects 30‑day utilization metrics and generates per‑pod reports comparing requests to P90,...