Open Compute Project

Creator

0 followers

Open hardware for data centers: servers, storage, networking

Video•Jun 1, 2026

HM - SGM _ System GPU Management - Workstream - (2026-01-16)

The meeting focused on the System GPU Management workstream, reviewing agenda changes, attendance constraints, and the need to push forward a GPU‑related presentation despite several participants being unavailable. The team revisited open GitHub issues, agreeing to retag those still relevant to the newer 1.1 and 1.7 releases while archiving legacy 1.0 items, streamlining the backlog for upcoming releases. Key technical discussion centered on how to model GPU‑related events within the existing DNTF message registry framework. Participants examined whether to reuse the network‑device registry, extend it, or create a dedicated GPU device or GPU‑fabric registry. They highlighted that many proposed GPU messages map cleanly onto network‑device definitions, but a subset lacks appropriate mapping, prompting a short‑term proposal to add specific port‑related messages and rename ambiguous terms like “degraded.” The dialogue also explored subscription mechanics, emphasizing that consumers need precise filters to receive only GPU‑specific events without enumerating numerous origin conditions. Examples from sensor registries illustrated the challenges of dynamic URIs and the risk of stale subscriptions. Consensus emerged around a hybrid approach: employ the network‑device registry for generic link events, introduce a GPU‑device registry for point‑to‑point connections, and consider a separate GPU‑fabric registry when topology resembles switch‑level fabrics. Implications include a clearer, more maintainable message taxonomy, reduced duplication across registries, and faster integration of GPU monitoring capabilities into existing tooling. By aligning terminology and registry design now, the group aims to support future accelerator workloads—such as AI inference—while minimizing long‑term engineering overhead.

Open Compute Project

HM - SGM _ System GPU Management - Workstream - (2026-01-16)

HM - FMFM _ Fleetscale Memory Fault Management - Workstream - (2026-01-13)

How SONiC Powers the World's Largest AI Infrastructure

FTS Sustainability Lightning Talks

FTS AI/HPC Lightning Talks

Data Quality Scoring System for Datacenter IT Embodied Carbon Accounting

Scaling Design for Sustainability Across Meta's Hardware Organization

Low-Temperature Waste Heat to Cooling: High-Power-Density Adsorption Chillers for De-Electrified Coo

Reducing Material Intensity and Lifecycle Emissions Using Superconducting Power Distribution in AI D

Scaling AI Infrastructure with Open Systems and Arm-Based Silicon

Data Center Compute Evolution

Server - HPC _ High Performance Computing - Sub-Project - (2026-04-14)

Server - HPC _ High Performance Computing - Sub-Project - (2026-03-31)

Server - Mezz NIC - Sub-Project - (2026-01-07)

Networking - SAI _ Switch Abstraction Interface - Sub-Project - (2025-12-11)

Technology Pulse