Open-Source "GreenBoost" Driver Aims To Augment NVIDIA GPUs vRAM With System RAM & NVMe To Handle Larger LLMs

•March 15, 2026

Phoronix•Mar 15, 2026

Key Takeaways

•GreenBoost adds system RAM as GPU memory
•NVMe storage used as secondary memory tier
•Works via kernel module and LD_PRELOAD shim
•Enables 31.8 GB LLM on 12 GB RTX 5070
•GPL‑v2, complements NVIDIA driver without modification

Summary

GreenBoost is an open‑source Linux kernel module that extends NVIDIA GPU VRAM by allocating pinned system RAM and NVMe storage as CUDA‑accessible memory. It pairs a kernel driver with an LD_PRELOAD shim that intercepts allocation calls, redirecting large buffers to the extended pool while keeping small allocations on‑board. The solution enables a 31.8 GB LLM to run on a consumer‑grade RTX 5070 with only 12 GB of native VRAM, without modifying NVIDIA’s proprietary driver. The project is released under GPL‑v2 and hosted on GitLab.

Pulse Analysis

The memory ceiling of discrete GPUs has become a primary bottleneck for deploying large language models (LLMs) in production. Traditional approaches—model quantization, CPU off‑loading, or multi‑GPU sharding—either degrade accuracy or demand costly hardware. GreenBoost tackles this limitation by presenting a software‑only memory tiering strategy that leverages existing system resources, effectively turning a standard workstation into a more capable AI inference platform.

Technically, GreenBoost introduces a kernel module that allocates pinned DDR4 pages in 2 MB chunks and exports them via DMA‑BUF file descriptors. The CUDA shim, injected through LD_PRELOAD, intercepts allocation APIs and redirects oversized buffers to the external pool, which the GPU accesses over a PCIe 4.0 ×16 link delivering roughly 32 GB/s bandwidth. This architecture preserves CUDA coherence, allowing frameworks like Ollama to perceive the expanded memory as native VRAM. Because the module operates alongside NVIDIA’s official driver, it requires no firmware changes, simplifying deployment for Linux users.

From a business perspective, the ability to run near‑35 GB models on a $400 RTX 5070 democratizes AI capabilities that were previously confined to high‑end, multi‑hundred‑dollar GPUs. Enterprises can defer capital expenditures while still delivering cutting‑edge NLP services. However, performance will still be bounded by PCIe latency and NVMe throughput, making GreenBoost most suitable for inference workloads where occasional memory spills are acceptable. As the open‑source community refines the code and benchmarks mature, GreenBoost could become a standard component of cost‑effective AI stacks, prompting hardware vendors to consider tighter CPU‑GPU memory integration in future designs.