NSDI '26 - Net-P4ct: Enhanced WAN Bandwidth Fair Sharing Using P4 Programmable Switches

USENIX Association
USENIX AssociationJun 1, 2026

Why It Matters

By moving bandwidth enforcement into programmable switches, NetPack dramatically cuts CPU load and improves WAN utilization, enabling faster, more reliable service provisioning at lower cost.

Key Takeaways

  • NetPack moves bandwidth enforcement from hosts to P4 switches.
  • Host agents consume excessive CPU; NetPack reduces usage below 3%.
  • Guarantees minimum bandwidth while allowing work‑conserving excess sharing.
  • Uses DCP marking and weighted fair queuing for enforcement.
  • Scalable across clusters; failsafe fallback ensures continuous connectivity.

Summary

The talk introduces NetPack, a WAN‑wide bandwidth management system that shifts traffic policing from per‑host eBPF agents to line‑rate P4 programmable switches. By installing service‑specific policies at ingress points, NetPack can recognize jobs via a unique identifier and enforce guaranteed and weighted‑fair allocations across distributed data centers. Key insights include the high CPU cost of host agents—millions of agents would consume tens of thousands of cores—and the difficulty of maintaining dozens of kernel versions. NetPack defines a job model with reserved and non‑reserved capacity pools, applies a water‑filling algorithm for weighted fairness, and replaces traditional rate‑limiting with DCP (Differentiated Congestion Point) marking that drives strict‑priority queuing on the switches. Operational results show that, even with 16,000 concurrent flows, NetPack’s CPU overhead stays under 3% while a host‑based solution approaches 15%. The system also supports cluster scaling, active failure probing, and a fallback path that bypasses the P4 cluster entirely, ensuring uninterrupted service. For operators, NetPack delivers higher link utilization, predictable service‑level guarantees, and a scalable enforcement plane that can be extended to other silicon platforms, reducing both capital and operational expenditures.

Original Description

Net-P4ct: Enhanced WAN Bandwidth Fair Sharing Using P4 Programmable Switches
Haoran Chen and Mingwei Cui, Bytedance; Yihan Zou, Yihang Miao, Suhan Jiang, Damu Ding, Lirong Lai, Ming Gao, Rui Jiang, Shengyuan He, Anjian Chen, Jiaming Shi, Junjie Wan, Yandong Duan, Ruomin Fang, Hongyu Wu, and Yongping Tang, ByteDance; Qiao Kang, unaffiliated; Guangrui Wu and Xiyun Xu, ByteDance
At growing internet companies like ByteDance, Wide Area Network (WAN) bandwidth sharing across diverse services with varying SLO requirements is a fundamental challenge. Conventional host-based enforcement systems, where agents identify and throttle traffic at the server end, face practical challenges such as "blind spot" traffic, kernel-dependent operational complexity, and significant server resource overhead. To address these issues, we present Net-P4ct, an in-network bandwidth enforcement system using P4 programmable switches. Net-P4ct improves both bandwidth guarantees and fair sharing by shifting dynamic QoS control into the switch data plane. Specifically, it achieves broader traffic coverage by combining host-side traffic tagging with a P4-switch pipeline, where service classification and QoS class assignment are performed. Based on observed traffic metrics, a centralized control plane determines real-time policy updates according to the max-min fair bandwidth allocation. We demonstrate the system's benefits including improved bandwidth utilization, reduced operational complexity, and lower per-byte processing cost. Net-P4ct has been deployed in ByteDance's production WAN for nearly a year, and we hope to share our experience with the community.

Comments

Want to join the conversation?

Loading comments...