Rafay – Blog

Rafay – Blog

Publication
0 followers

Kubernetes platform operations and automation.

Understanding LLM Inference Metrics in Rafay's Token Factory
NewsMar 27, 2026

Understanding LLM Inference Metrics in Rafay's Token Factory

Rafay’s Token Factory turns GPU clusters into managed LLM inference APIs with built‑in multi‑tenancy, token‑metered billing and auto‑scaling. The platform ships a metrics dashboard that surfaces latency (TTFT, ITL, E2E), throughput and KV‑cache utilization at multiple percentiles, letting operators gauge...

By Rafay – Blog
Flexible GPU Billing Models for AI Clouds: Powering the AI Factory with Rafay
NewsMar 25, 2026

Flexible GPU Billing Models for AI Clouds: Powering the AI Factory with Rafay

Rafay announced the addition of a reservation‑based billing model to its GPU‑cloud platform, complementing existing on‑demand and monthly recurring charge options. The new feature guarantees customers access to a specified number of GPUs—such as 16 NVIDIA H200 units—for a fixed...

By Rafay – Blog
Kubernetes Makes GPUs First-Class: Advances in Allocation, Scheduling, and Isolation
NewsMar 25, 2026

Kubernetes Makes GPUs First-Class: Advances in Allocation, Scheduling, and Isolation

At KubeCon Europe 2026 NVIDIA donated its Dynamic Resource Allocation (DRA) driver, saw the KAI scheduler graduate to a CNCF Sandbox project, and added GPU support to Kata Containers. These moves turn GPUs into first‑class, community‑owned resources in Kubernetes, enabling...

By Rafay – Blog
Eliminate SSH Access with Rafay MKS Control Plane Overrides
NewsMar 24, 2026

Eliminate SSH Access with Rafay MKS Control Plane Overrides

Rafay has introduced Control Plane Overrides for its Managed Kubernetes Service (MKS), allowing administrators to customize API Server, Controller Manager, and Scheduler settings without SSHing into master nodes. The declarative approach lets users define extra arguments, volumes, and mounts directly...

By Rafay – Blog
OpenClaw on Kubernetes: Designing Always-On AI as a Platform Service Meta Description
NewsMar 23, 2026

OpenClaw on Kubernetes: Designing Always-On AI as a Platform Service Meta Description

OpenClaw is an open‑source, gateway‑centric runtime that turns generative AI into an always‑on service deployed on Kubernetes. It provides a unified onboarding flow for workspaces, channels and skills, and ships with a documented Kubernetes install path and operator. The platform...

By Rafay – Blog
A Self-Service GPU Experience That Feels Instant | Rafay
NewsMar 23, 2026

A Self-Service GPU Experience That Feels Instant | Rafay

Rafay’s Developer Pods let developers request GPU‑enabled environments through a simple UI, bypassing tickets, YAML, and long wait times. Within roughly 30 seconds, a pod spins up and is reachable via SSH, offering pre‑built images such as Ubuntu and various...

By Rafay – Blog
Developer Pods for Platform Teams: Designing Self-Service GPU Experiences
NewsMar 23, 2026

Developer Pods for Platform Teams: Designing Self-Service GPU Experiences

Rafay’s SKU Studio enables platform teams to package GPU‑powered Kubernetes environments as ready‑to‑use Developer Pods. By defining curated SKUs with clear descriptions, guided inputs and prescriptive outputs, teams turn raw infrastructure into a self‑service product that launches in about 30...

By Rafay – Blog
Instant Developer Pods: Rethinking GPU Access for AI Teams | Rafay
NewsMar 23, 2026

Instant Developer Pods: Rethinking GPU Access for AI Teams | Rafay

Rafay’s Developer Pods redefine GPU access by delivering ready‑to‑use Ubuntu environments with CUDA in roughly 30 seconds, eliminating the multi‑day ticket queues and bulky VM provisioning that plague many enterprises. The solution abstracts Kubernetes away from developers, offering a simple...

By Rafay – Blog
Stop Paying for Unused Kubernetes Resources | Optimize Pod Efficiency
NewsMar 23, 2026

Stop Paying for Unused Kubernetes Resources | Optimize Pod Efficiency

Kubernetes platforms often suffer from over‑provisioned pods as developers pad CPU and memory requests to avoid OOM or throttling. Rafay’s App Resizing feature, introduced in the 4.1 release, collects 30‑day utilization metrics and generates per‑pod reports comparing requests to P90,...

By Rafay – Blog
How Rafay & NVIDIA Help NeoClouds Monetize AI with Token Factories
NewsMar 18, 2026

How Rafay & NVIDIA Help NeoClouds Monetize AI with Token Factories

The AI surge has spurred a new class of GPU‑first cloud providers, called neoclouds, that initially sold raw GPU capacity. Rafay’s Token Factory now lets these providers expose models as token‑metered APIs, turning infrastructure into a consumable AI service. Deep...

By Rafay – Blog
‍Rafay Launches AI Grid Orchestration Solution to Help Telcos Intelligently Deploy Distributed AI Infrastructure‍
NewsMar 17, 2026

‍Rafay Launches AI Grid Orchestration Solution to Help Telcos Intelligently Deploy Distributed AI Infrastructure‍

Rafay, an NVIDIA Inception startup, unveiled an AI Grid orchestration platform that turns existing telco edge infrastructure into a self‑service, multi‑tenant AI factory. The solution lets operators express intent—such as latency, cost, or security requirements—and automatically places GPU workloads across...

By Rafay – Blog
From Infrastructure Validation to Market Validation: Rafay and NVIDIA DSX Air
NewsMar 16, 2026

From Infrastructure Validation to Market Validation: Rafay and NVIDIA DSX Air

NVIDIA DSX Air provides a full‑stack simulation that lets cloud providers validate networking, GPU servers, storage and connectivity before any rack is shipped. Rafay layers a self‑service orchestration platform on top, enabling multi‑tenant, governance and workflow testing alongside the hardware...

By Rafay – Blog
AI Assistants for Kubernetes: Secure Cluster Operations with MCP and Rafay ZTKA
NewsMar 10, 2026

AI Assistants for Kubernetes: Secure Cluster Operations with MCP and Rafay ZTKA

The Model Context Protocol (MCP) lets AI assistants run Kubernetes commands through a local server while Rafay’s Zero Trust Kubectl Access (ZTKA) supplies a secure, token‑less kubeconfig. This architecture places the MCP server on the admin workstation, routes traffic via...

By Rafay – Blog
Run GPU Hackathons at Scale: How Rafay Enables GPU Cloud Providers
NewsMar 10, 2026

Run GPU Hackathons at Scale: How Rafay Enables GPU Cloud Providers

Rafay’s platform lets GPU cloud operators provision and manage thousands of GPU‑backed Jupyter notebooks for hackathons through a declarative API and templated SKUs. By batching parallel API calls and using an inventory‑aware scheduler, operators can spin up 1,000 environments in...

By Rafay – Blog
Validate GPU Health in Kubernetes with Rafay Zero Trust Kubectl Access
NewsMar 10, 2026

Validate GPU Health in Kubernetes with Rafay Zero Trust Kubectl Access

Rafay’s zero‑trust kubectl lets operators run commands inside pods on remote GPU‑enabled Kubernetes clusters without exposing the API or using bastion hosts. Using this workflow, they open an exec session to the nvidia‑dcgm‑exporter pod and execute nvidia‑smi to verify driver,...

By Rafay – Blog
Rafay Joins VAST Cosmos to Enable Governed GPU-Powered AI Services
NewsFeb 25, 2026

Rafay Joins VAST Cosmos to Enable Governed GPU-Powered AI Services

Rafay has joined the VAST Cosmos Community as a Technology Partner, aligning its AI‑native cloud control plane with VAST Data’s AI Operating System. The collaboration integrates Rafay’s orchestration platform with VAST’s governed storage services, creating a unified, multi‑tenant AI service...

By Rafay – Blog
What Is an AI Factory? Enterprise & Cloud Guide
NewsFeb 17, 2026

What Is an AI Factory? Enterprise & Cloud Guide

An AI factory is an operational model that industrializes artificial‑intelligence development by linking high‑performance compute, data pipelines, orchestration, governance and deployment into a continuous production system. The concept, popularized by NVIDIA, moves AI from isolated experiments to repeatable, scalable outputs....

By Rafay – Blog
From Tickets to Self-Service AI Infrastructure
NewsFeb 17, 2026

From Tickets to Self-Service AI Infrastructure

Many enterprises still provision AI resources through ticket systems, causing delays and underutilized GPUs. Modern developers now expect instant, self‑service access similar to hyperscaler offerings, making manual approval a competitive risk. The shift to automated, governed platforms improves utilization, speeds...

By Rafay – Blog
What Is Amazon EKS? EKS & EKS Anywhere Explained | Rafay
NewsFeb 17, 2026

What Is Amazon EKS? EKS & EKS Anywhere Explained | Rafay

Amazon Elastic Kubernetes Service (EKS) dominates the managed Kubernetes market with roughly 50% share, offering a fully managed control plane, deep AWS integration, and serverless compute via Fargate. EKS Anywhere, launched in 2020, extends the same open‑source distro to on‑premise...

By Rafay – Blog
Migrating Existing Amazon EKS Clusters to EKS Auto Mode | Rafay
NewsFeb 17, 2026

Migrating Existing Amazon EKS Clusters to EKS Auto Mode | Rafay

Amazon EKS Auto Mode automates node scaling, patching, and add‑on management, but AWS does not provide an automated path for migrating applications, storage, or ingress. Rafay offers a guided, cluster‑level migration process that includes converting to access entries, enabling Auto...

By Rafay – Blog