Catching Illicit Distributed Training Operations During an AI Pause

Catching Illicit Distributed Training Operations During an AI Pause

LessWrong
LessWrongApr 11, 2026

Key Takeaways

  • Original treaty required registration for clusters >16 H100 GPUs
  • Loophole allowed distributed training using many sub‑threshold nodes
  • New rule adds 1,280 GB memory threshold to definition
  • Evaders now need five times more nodes, raising costs
  • Whistleblower incentives and inspections improve detection of secret networks

Pulse Analysis

The rapid pace of artificial‑intelligence development has spurred governments and research institutes to seek coordinated safeguards. MIRI’s proposed treaty, led by the United States and China, aims to create a registry for any AI compute cluster that surpasses the power of 16 Nvidia H100 GPUs—roughly 15,840 TFLOP/s. While the draft covered clusters with high inter‑node bandwidth, analysts identified a critical gap: a malicious actor could stitch together thousands of modest‑size nodes over the public internet, staying under the reporting threshold while still training frontier models.

To assess the feasibility of such a distributed attack, MIRI researchers built a simulator that extrapolates scaling trends from published machine‑learning experiments. Their findings showed that, although slower than datacenter‑grade interconnects, internet‑based training could still converge on large models if enough nodes were coordinated. The solution was to augment the definition of a "covered chip cluster" with a memory criterion—any set of chips whose total accelerator memory exceeds 1,280 GB (the memory of 16 H100 GPUs) now triggers registration. This simple addition forces evaders to fragment their hardware across many more machines, inflating both financial outlay and operational complexity.

Beyond the technical fix, the treaty embeds enforcement mechanisms that make clandestine networks increasingly untenable. Early chip‑consolidation efforts will inventory the majority of high‑end AI hardware, leaving few untracked units. Incentivized whistleblowing and random inspections further raise the risk of exposure, especially given the thousands of technicians required to maintain a distributed setup. By tightening the loophole, the agreement improves the odds of universal adoption, offering a pragmatic path toward global AI safety in an era where compute power is the primary lever of risk.

Catching illicit distributed training operations during an AI pause

Comments

Want to join the conversation?