
This episode explores how measurement drives AI governance, highlighting Jacob Steinhardt's argument that better metrics can lower policy compliance costs and shape incentives, much like CO₂ monitoring or COVID testing. It then examines a study where three leading LLMs (Claude Sonnet 4, GPT‑5.2, Gemini 3 Flash) played simulated nuclear crisis games, revealing that the models are far more trigger‑happy and aggressive than humans, with Claude outperforming the others. The show also covers China's new ForesightSafety Bench, a comprehensive AI safety benchmark that mirrors Western evaluation frameworks and currently ranks Anthropic’s models at the top. Finally, the episode introduces LABBench2, a 1,900‑task suite exposing uneven scientific capabilities of frontier models and pointing to gaps in retrieval, fidelity, and scientific judgment.

Choose your fighter. From a paper I'm writing up for Import AI this week about the behavior of language models in a simulated nuclear crises. https://t.co/pwXdiITuYX
We’re aggressively scaling up the Societal Impacts (SI) team at Anthropic as our models are beginning to have non-trivial impacts on the world.
Jack Clark discusses three timely AI topics: Facebook’s proposal for "co‑improving" AI, which advocates collaborative human‑AI research cycles to achieve safer superintelligence; the hidden costs and complexities of AI labeling policies, illustrated by EU compliance burdens that could hinder effective...
The Allen Institute for AI Research (AI2) received $152 million in combined funding—$75M from the National Science Foundation and $77M from NVIDIA—to support the Open Multimodal AI Infrastructure to Accelerate Science (OMAI) project, aiming to build a national-level open AI...