AI as Normal Technology

Creator

0 followers

Analyzing AI as transformative but normal technology, not superintelligence.

Blog•Apr 16, 2026

Open-World Evaluations for Measuring Frontier AI Capabilities

The paper introduces “open‑world evaluations,” a new class of AI testing that places agents in messy, real‑world tasks rather than tidy benchmarks. It surveys ten recent experiments, outlines best practices, and launches the CRUX collaboration of 17 researchers to run such evaluations regularly. In CRUX’s first study, an AI agent built and published an iOS app to the Apple App Store, incurring roughly $1,000 in costs and making only two errors, one requiring manual fix. The authors argue these evaluations provide early warnings of emerging capabilities, such as automated app‑store spam.

By AI as Normal Technology

AI as Normal Technology

Open-World Evaluations for Measuring Frontier AI Capabilities

Technology Pulse