CE - IC - Hardware Management for Liquid Cooling - Workstream (2026-04-15)
Why It Matters
Standardizing leak detection and environmental metrics in Redfish will streamline monitoring of liquid‑cooled AI racks, reducing downtime and accelerating adoption of high‑performance compute infrastructure.
Key Takeaways
- •Claude with agents solved mock‑up generation failures for rack DLC.
- •Upcoming OCP Europe summit will provide updates on European collaborations.
- •Redfish schema currently lacks detailed leak‑detector types and pressure metrics.
- •MQTT is used as transport; Redfish remains the unified data model.
- •Industry input needed to extend environment metrics for differential pressure sensors.
Summary
The meeting focused on progress in hardware management for liquid‑cooled AI racks, covering a recent breakthrough in mock‑up generation, ongoing Redfish schema work, and coordination with the OCP Europe community. Using Claude’s agent‑enabled cursor mode, the team finally completed a full rack mock‑up—including eight servers, a DLC, and a CDU—overcoming earlier failures that stalled at 50‑70% completion. The agenda also previewed the upcoming OCP Europe summit, where European partners will discuss standards alignment and share field experiences. Key insights included the resolution of the mock‑up bottleneck, the identification of gaps in the Redfish schema—particularly around leak‑detector types, humidity, temperature, and differential pressure sensors—and the role of MQTT as a transport layer that feeds data into the Redfish model. Participants highlighted that current leak‑detection schemas only support simple alarm states, lacking granularity for rope‑style detectors or distance‑to‑leak metrics, and that environmental metrics need explicit sensor definitions for barometric and differential air pressure. Notable examples were drawn from Nvidia’s AI‑pod reference design, which enumerates temperature, humidity, and airflow pressure but relies on MQTT for telemetry. The discussion underscored how Redfish can provide a unified context for such data, ensuring that telemetry from any transport can be mapped back to specific chassis, rack, or component locations. Attendees also shared a draft schema snippet showing leak‑detector groups, sensor types, and the need for additional properties. The implications are clear: without extending the Redfish schema to cover nuanced leak‑detection and pressure sensors, operators risk fragmented monitoring and limited automation in high‑density liquid‑cooled environments. Industry contributions to the DMTF working group are essential to standardize these extensions, enabling interoperable management tools and smoother integration of MQTT‑based telemetry across AI data centers.
Comments
Want to join the conversation?
Loading comments...