Have We Already Lost? Part 1: The Plan in 2024

Have We Already Lost? Part 1: The Plan in 2024

LessWrong
LessWrongApr 9, 2026

Key Takeaways

  • 2024 AI safety plan hinged on voluntary commitments and AI‑assisted research.
  • Governance, policy, and technical milestones largely fell short of expectations.
  • Community over‑invested in Anthropic, reducing independent safety initiatives.
  • Faster AI progress and poor political climate heightened alignment risks.
  • Emerging empirical alignment methods and public skepticism offer renewed hope.

Pulse Analysis

In mid‑2024 the AI safety community drafted a three‑stage “victory” roadmap: first, buy time by securing voluntary, conditional commitments from developers; second, build moderately powerful research AIs and extract 2‑3× cognitive labor from them; third, translate that assistance into concrete technical and policy safeguards. The strategy assumed that halting AI progress outright was impossible, but that a coordinated pause could be triggered once risks became acute. Investment was funneled into Anthropic as a flagship lab, while evaluation frameworks and early empirical alignment techniques were earmarked as short‑term levers.

Two years later, many of those assumptions have proven fragile. Legislative inertia and fragmented international standards left voluntary commitments largely symbolic, while the pace of model scaling outstripped the modest oversight tools the community could field. Heavy reliance on Anthropic narrowed the diversity of safety research, and several ambitious projects—such as mechanistic interpretability at scale—failed to deliver measurable progress. Coupled with a deteriorating U.S. political climate and rising geopolitical competition, the gap between AI capability and governance has widened, intensifying existential risk concerns.

Nevertheless, the outlook is not hopeless. Recent breakthroughs in empirical alignment, including scalable oversight on frontier models, suggest that “wing‑it” approaches may be more viable than previously thought. Anthropic’s continued lead provides a potential anchor for responsible development, while growing public distrust of big‑tech AI fuels legislative pressure worldwide. Non‑U.S. governments are beginning to assert leverage through export controls and joint research agreements, offering a geopolitical counterbalance. If the community can diversify its research portfolio and translate empirical gains into policy, it may still steer the trajectory toward a safer AI future.

Have we already lost? Part 1: The Plan in 2024

Comments

Want to join the conversation?