USENIX Association

Creator

0 followers

SREcon, OSDI, and leading systems/infra research talks

Video•Jun 2, 2026

NSDI '26 - Co-Designing Traffic Control with NVMe-oF for Disaggregated Storage: A Comparative Study

The presentation compares two disaggregated‑storage fabrics— a traditional switched SAN and a novel switchless mesh— and proposes traffic‑control mechanisms co‑designed with NVMe‑of. By decoupling compute and storage, modern data centers face rapidly scaling PCIe bandwidth and higher drive density, prompting a choice between expensive high‑port switches and cheaper multi‑hop adapters. Three observations drive the design: network round‑trip time is a tiny fraction of total I/O latency, NVMe‑of traffic is inherently pairwise and asymmetric, and SSD performance throttles the entire fabric, causing back‑pressure that fills switch buffers. Existing congestion‑control schemes ignore these traits, so the authors introduce in‑network telemetry‑assisted symmetric routing, eager bandwidth reservation, and storage‑driven traffic scheduling. Experiments on four nodes equipped with Samsung PM93 SSDs show that both architectures deliver similar throughput, but the switchless mesh reduces average latency by 20‑28% and tail latency by about 25% across YCSB workloads. Cost analysis reveals that at 1,000 ports the switchless design saves roughly 50% of capital expenditure compared to a fully switched fabric. The findings suggest that, for disaggregated storage, a switchless topology combined with traffic‑aware control can deliver lower latency and substantial cost savings without sacrificing performance, offering a practical blueprint for next‑generation data‑center storage networks.

By USENIX Association

Video•Jun 2, 2026

NSDI '26 - Defending Against Traffic Analysis Attacks with Flexible In-Network Obfuscation

The NSDI ’26 presentation introduced “Securities,” a flexible in‑network obfuscation system designed to thwart traffic‑analysis attacks without relying on external proxy services. By moving the obfuscation logic to the user’s edge network, the framework eliminates the need for cooperation from...

By USENIX Association

Video•Jun 1, 2026

NSDI '26 - Net-P4ct: Enhanced WAN Bandwidth Fair Sharing Using P4 Programmable Switches

The talk introduces NetPack, a WAN‑wide bandwidth management system that shifts traffic policing from per‑host eBPF agents to line‑rate P4 programmable switches. By installing service‑specific policies at ingress points, NetPack can recognize jobs via a unique identifier and enforce guaranteed...

By USENIX Association

Video•May 7, 2026

SREcon26 Americas - Intelligent Load Balancing in Kubernetes

The SREcon26 talk details Databricks’ effort to solve request‑imbalance issues in its Kubernetes‑based services by moving from the platform’s default load‑balancing to a custom, intelligent solution. Databricks discovered that Kubernetes distributes connections uniformly, not individual requests. Because their traffic relies heavily...

By USENIX Association

Video•Apr 23, 2026

SREcon26 Americas - Intelligent Load Balancing in Kubernetes

Databricks engineers Gaurav Nanda and Vincent Cheng revealed that Kubernetes’ default kube‑proxy and DNS model struggles with long‑lived HTTP streams and high‑throughput gRPC, leading to pod hot‑spotting and tail‑latency spikes. They propose a client‑side, control‑plane‑driven load balancer that removes kube‑proxy...

By USENIX Association

Video•Apr 23, 2026

SREcon26 Americas - Stop Reading Changelogs: Safer Kubernetes Upgrades with Simulation

The talk, “Stop Reading Changelogs: Safer Kubernetes Upgrades with Simulation,” opens with a vivid reminder of Reddit’s 314‑minute outage in March 2023, caused by a label change in a Kubernetes 1.23‑to‑1.24 upgrade that broke Calico’s node selectors. Speaker David “Dr....

By USENIX Association

Video•Apr 7, 2026

FAST '26 - AdaCheck: An Adaptive Checkpointing System for Efficient LLM Training with Redundancy...

The video introduces AdaCheck, an adaptive checkpointing framework designed to curb the massive resource waste inherent in large‑language‑model (LLM) training. By recognizing that parallelism and model‑parallel architectures create duplicated tensors across workers, the authors propose a system that dynamically trims...

By USENIX Association

Video•Apr 7, 2026

FAST '26 - Rearchitecting Buffered I/O in the Era of High-Bandwidth SSDs

The presentation, delivered by Chao of Hajing University of Science Technology, tackles the growing mismatch between buffered I/O architectures and today’s ultra‑high‑bandwidth SSDs. Over the past 15 years, SSD throughput has leapt from roughly 500 MB/s to 28 GB/s—a 56‑fold increase—rendering the...

By USENIX Association

Video•Apr 2, 2026

OSDI '20 - AGAMOTTO: How Persistent Is Your Persistent Memory Application?

The presentation introduced Agamoto, a symbolic‑execution framework designed to automatically uncover persistency bugs in applications that use Intel’s emerging persistent‑memory (PM) technology. By mapping PM directly into a process’s address space, developers can avoid file‑system overhead, but they must also...

By USENIX Association

USENIX Association

NSDI '26 - Co-Designing Traffic Control with NVMe-oF for Disaggregated Storage: A Comparative Study

NSDI '26 - Defending Against Traffic Analysis Attacks with Flexible In-Network Obfuscation

NSDI '26 - Net-P4ct: Enhanced WAN Bandwidth Fair Sharing Using P4 Programmable Switches

SREcon26 Americas - Intelligent Load Balancing in Kubernetes

SREcon26 Americas - Intelligent Load Balancing in Kubernetes

SREcon26 Americas - Stop Reading Changelogs: Safer Kubernetes Upgrades with Simulation

FAST '26 - AdaCheck: An Adaptive Checkpointing System for Efficient LLM Training with Redundancy...

FAST '26 - Rearchitecting Buffered I/O in the Era of High-Bandwidth SSDs

OSDI '20 - AGAMOTTO: How Persistent Is Your Persistent Memory Application?

Technology Pulse