.png)
Token Factory GA: Monetize AI with Token-Based APIs | Rafay
Rafay Systems has launched Token Factory, a token‑metered API layer for AI models that lets neocloud and AI‑factory operators monetize compute without building their own orchestration stack. The announcement coincides with the industry’s pivot to token‑based commerce, a trend underscored by Jensen Huang at GTC 2026. The GPU‑as‑a‑Service market is projected to reach $26.43 billion by 2031, creating a lucrative opportunity for operators who can sell token plans instead of raw GPU hours. Early adopters across six continents are already deploying the solution.

Understanding LLM Inference Metrics in Rafay's Token Factory
Rafay’s Token Factory turns GPU clusters into managed LLM inference APIs with built‑in multi‑tenancy, token‑metered billing and auto‑scaling. The platform ships a metrics dashboard that surfaces latency (TTFT, ITL, E2E), throughput and KV‑cache utilization at multiple percentiles, letting operators gauge...
.png)
Flexible GPU Billing Models for AI Clouds: Powering the AI Factory with Rafay
Rafay announced the addition of a reservation‑based billing model to its GPU‑cloud platform, complementing existing on‑demand and monthly recurring charge options. The new feature guarantees customers access to a specified number of GPUs—such as 16 NVIDIA H200 units—for a fixed...

Kubernetes Makes GPUs First-Class: Advances in Allocation, Scheduling, and Isolation
At KubeCon Europe 2026 NVIDIA donated its Dynamic Resource Allocation (DRA) driver, saw the KAI scheduler graduate to a CNCF Sandbox project, and added GPU support to Kata Containers. These moves turn GPUs into first‑class, community‑owned resources in Kubernetes, enabling...

Eliminate SSH Access with Rafay MKS Control Plane Overrides
Rafay has introduced Control Plane Overrides for its Managed Kubernetes Service (MKS), allowing administrators to customize API Server, Controller Manager, and Scheduler settings without SSHing into master nodes. The declarative approach lets users define extra arguments, volumes, and mounts directly...
.png)
OpenClaw on Kubernetes: Designing Always-On AI as a Platform Service Meta Description
OpenClaw is an open‑source, gateway‑centric runtime that turns generative AI into an always‑on service deployed on Kubernetes. It provides a unified onboarding flow for workspaces, channels and skills, and ships with a documented Kubernetes install path and operator. The platform...

A Self-Service GPU Experience That Feels Instant | Rafay
Rafay’s Developer Pods let developers request GPU‑enabled environments through a simple UI, bypassing tickets, YAML, and long wait times. Within roughly 30 seconds, a pod spins up and is reachable via SSH, offering pre‑built images such as Ubuntu and various...
.png)
Developer Pods for Platform Teams: Designing Self-Service GPU Experiences
Rafay’s SKU Studio enables platform teams to package GPU‑powered Kubernetes environments as ready‑to‑use Developer Pods. By defining curated SKUs with clear descriptions, guided inputs and prescriptive outputs, teams turn raw infrastructure into a self‑service product that launches in about 30...

Instant Developer Pods: Rethinking GPU Access for AI Teams | Rafay
Rafay’s Developer Pods redefine GPU access by delivering ready‑to‑use Ubuntu environments with CUDA in roughly 30 seconds, eliminating the multi‑day ticket queues and bulky VM provisioning that plague many enterprises. The solution abstracts Kubernetes away from developers, offering a simple...

Stop Paying for Unused Kubernetes Resources | Optimize Pod Efficiency
Kubernetes platforms often suffer from over‑provisioned pods as developers pad CPU and memory requests to avoid OOM or throttling. Rafay’s App Resizing feature, introduced in the 4.1 release, collects 30‑day utilization metrics and generates per‑pod reports comparing requests to P90,...

How Rafay & NVIDIA Help NeoClouds Monetize AI with Token Factories
The AI surge has spurred a new class of GPU‑first cloud providers, called neoclouds, that initially sold raw GPU capacity. Rafay’s Token Factory now lets these providers expose models as token‑metered APIs, turning infrastructure into a consumable AI service. Deep...

Rafay Launches AI Grid Orchestration Solution to Help Telcos Intelligently Deploy Distributed AI Infrastructure
Rafay, an NVIDIA Inception startup, unveiled an AI Grid orchestration platform that turns existing telco edge infrastructure into a self‑service, multi‑tenant AI factory. The solution lets operators express intent—such as latency, cost, or security requirements—and automatically places GPU workloads across...

From Infrastructure Validation to Market Validation: Rafay and NVIDIA DSX Air
NVIDIA DSX Air provides a full‑stack simulation that lets cloud providers validate networking, GPU servers, storage and connectivity before any rack is shipped. Rafay layers a self‑service orchestration platform on top, enabling multi‑tenant, governance and workflow testing alongside the hardware...
.png)
AI Assistants for Kubernetes: Secure Cluster Operations with MCP and Rafay ZTKA
The Model Context Protocol (MCP) lets AI assistants run Kubernetes commands through a local server while Rafay’s Zero Trust Kubectl Access (ZTKA) supplies a secure, token‑less kubeconfig. This architecture places the MCP server on the admin workstation, routes traffic via...

Run GPU Hackathons at Scale: How Rafay Enables GPU Cloud Providers
Rafay’s platform lets GPU cloud operators provision and manage thousands of GPU‑backed Jupyter notebooks for hackathons through a declarative API and templated SKUs. By batching parallel API calls and using an inventory‑aware scheduler, operators can spin up 1,000 environments in...
.png)
Validate GPU Health in Kubernetes with Rafay Zero Trust Kubectl Access
Rafay’s zero‑trust kubectl lets operators run commands inside pods on remote GPU‑enabled Kubernetes clusters without exposing the API or using bastion hosts. Using this workflow, they open an exec session to the nvidia‑dcgm‑exporter pod and execute nvidia‑smi to verify driver,...

Rafay Joins VAST Cosmos to Enable Governed GPU-Powered AI Services
Rafay has joined the VAST Cosmos Community as a Technology Partner, aligning its AI‑native cloud control plane with VAST Data’s AI Operating System. The collaboration integrates Rafay’s orchestration platform with VAST’s governed storage services, creating a unified, multi‑tenant AI service...

What Is an AI Factory? Enterprise & Cloud Guide
An AI factory is an operational model that industrializes artificial‑intelligence development by linking high‑performance compute, data pipelines, orchestration, governance and deployment into a continuous production system. The concept, popularized by NVIDIA, moves AI from isolated experiments to repeatable, scalable outputs....

From Tickets to Self-Service AI Infrastructure
Many enterprises still provision AI resources through ticket systems, causing delays and underutilized GPUs. Modern developers now expect instant, self‑service access similar to hyperscaler offerings, making manual approval a competitive risk. The shift to automated, governed platforms improves utilization, speeds...

What Is Amazon EKS? EKS & EKS Anywhere Explained | Rafay
Amazon Elastic Kubernetes Service (EKS) dominates the managed Kubernetes market with roughly 50% share, offering a fully managed control plane, deep AWS integration, and serverless compute via Fargate. EKS Anywhere, launched in 2020, extends the same open‑source distro to on‑premise...
.png)
Migrating Existing Amazon EKS Clusters to EKS Auto Mode | Rafay
Amazon EKS Auto Mode automates node scaling, patching, and add‑on management, but AWS does not provide an automated path for migrating applications, storage, or ingress. Rafay offers a guided, cluster‑level migration process that includes converting to access entries, enabling Auto...