NSDI '26 - Geminet: Learning the Duality-Based Topology-Agnostic Update Operator for Lightweight...
Why It Matters
Because modern datacenters constantly reconfigure links and capacities, GeminiTE’s fast, topology‑agnostic TE enables operators to maintain low congestion without costly model retraining, translating into higher network utilization and reduced infrastructure spend.
Key Takeaways
- •GeminiTE learns topology‑agnostic updates for traffic engineering in real‑time.
- •Moves state from path‑level to edge‑level dual variables.
- •Achieves comparable MRU reduction while using <5% GPU memory.
- •Inference runs up to 18× faster than prior GNN models.
- •Scales to large datacenter topologies without retraining after changes.
Summary
The NSDI ’26 presentation introduced GeminiTE, a learning‑based traffic‑engineering framework that uses a duality‑driven, topology‑agnostic update operator to compute lightweight split‑ratio solutions in rapidly changing network topologies.
The authors argue that a good TE algorithm must simultaneously deliver high solution quality, scale to datacenter‑size graphs, and remain functional after topology changes without retraining. Traditional linear‑programming solvers provide optimal MRU but are too slow, while earlier neural approaches either lock to a single topology or incur heavy graph‑encoding overhead.
GeminiTE addresses these gaps with two innovations: (1) a topology‑agnostic edge‑level update operator that replaces learned graph encoders, and (2) a shift from path‑level primal variables to edge‑level dual variables, dramatically shrinking the state space. Experiments show GeminiTE uses only 4 % of GPU memory, runs up to 3.6× faster to target MRU, and on the largest KDL topology is 18× faster than the best prior GNN model while using less than 0.1 % of its parameters.
By delivering near‑optimal load balancing with orders‑of‑magnitude lower compute and memory footprints, GeminiTE makes real‑time, adaptive traffic engineering practical for large, reconfigurable datacenter fabrics, potentially lowering operational costs and improving service reliability.
Comments
Want to join the conversation?
Loading comments...