Writing Code versus Shipping Code: Productivity Effects Across Generations of AI Coding Tools

•June 20, 2026

CEPR — VoxEU•Jun 20, 2026

Companies Mentioned

GitHub

Microsoft

MSFT

OpenAI

Apple

AAPL

Google

GOOG

SourceForge

LNXGF

Why It Matters

The findings temper optimistic AI‑productivity forecasts by showing that gains in coding speed do not automatically boost final product output, highlighting downstream bottlenecks that firms and policymakers must address.

Key Takeaways

•Autocomplete lifts commit output ~40%; sync agents add ~140%.
•Async agents raise coding activity to ~180% but releases up 30%.
•Productivity gains shrink at each higher production stage, confirming weak‑link hypothesis.
•New app releases rise, yet user engagement per app stays flat.
•Future AI impact hinges on automating review, integration, testing, and discovery.

Pulse Analysis

The study leverages a matched event‑study design that pairs each AI‑tool adopter with a near‑identical developer from a year earlier, ensuring a causal estimate of productivity effects. Across three tool generations, the data reveal a clear escalation: basic autocomplete boosts commit counts by roughly 40%, sync agents push cumulative output to about 140%, and the latest async agents push coding activity to an estimated 180% increase. These figures align with earlier experimental work, confirming that AI can dramatically accelerate the most granular coding tasks.

However, the research uncovers a classic weak‑link or O‑ring dynamic in software production. While AI‑generated code multiplies at the commit level, the downstream stages—pull‑request review, integration, testing, and final release—remain human‑centric and constrain overall output. Consequently, project counts climb only 50% and shipped releases 30%, with no measurable rise in consumer usage. This bottleneck explains why broader market indicators, such as app store releases, show modest growth and why many new apps fail to attract users. The pattern suggests that task‑level productivity gains cannot be linearly extrapolated to aggregate economic impact.

Looking forward, the authors argue that the next wave of AI impact will depend on extending automation beyond code generation to the remaining human‑heavy stages. Tool providers are already experimenting with AI‑assisted code review and automated testing, which could shift the bottleneck downstream. For investors and policymakers, the implication is clear: measuring AI’s macroeconomic contribution requires tracking progress in these complementary stages, not just raw coding metrics. As generative AI matures, its true productivity dividend will emerge when the entire software pipeline becomes more seamlessly automated.

Writing code versus shipping code: Productivity effects across generations of AI coding tools

Read Original Article

Comments

Want to join the conversation?

Loading comments...