Why Backtests Decay: Regime Dependence and Crowding

Why Backtests Decay: Regime Dependence and Crowding

Harbourfront Quantitative
Harbourfront QuantitativeMay 8, 2026

Key Takeaways

  • Study covers 1,726 strategies from ten global institutions (2009‑2025).
  • Live performance decays about 2‑3% annually versus backtested results.
  • Factor regime at launch explains most apparent backtest skill.
  • Regime timing and crowding cause residual decay after peer benchmarking.
  • Recommended haircut should rise with extremity of pre‑launch factor regime.

Pulse Analysis

Backtesting remains a cornerstone of systematic strategy design, offering a low‑cost glimpse into how a model might behave under historical market conditions. Yet practitioners have long warned that a strong backtest can mask overfitting, data‑snooping, or simply the luck of a favorable regime. The recent arXiv paper by Chang Liu quantifies this intuition by examining a uniquely large commercial sample, revealing that the average live‑performance shortfall hovers around two to three percentage points per year. By comparing each strategy to a leave‑one‑out bucket‑average peer, the study isolates the common factor environment that drives much of the backtest’s apparent outperformance.

The authors identify two structural mechanisms behind the decay. First, the timing of a strategy’s launch relative to macro‑factor regimes—such as a carry trade introduced during a low‑volatility stretch—can artificially inflate backtest returns. Second, a horizon‑dependent launch‑density effect suggests that crowded entry points amplify the decay as more market participants chase the same signal. Together, these forces render the residual skill component economically negligible, prompting the recommendation of a regime‑adjusted haircut that scales with the extremity of pre‑launch factor conditions.

For institutional allocators and quantitative developers, the findings mandate a shift from raw backtest numbers to a more nuanced, benchmark‑adjusted evaluation framework. Incorporating peer‑group comparisons and regime filters can dramatically improve the reliability of alpha forecasts, reducing the risk of allocating capital to strategies that merely rode a transient market wave. As the industry increasingly embraces machine‑learning models, the lesson remains clear: robust performance attribution must account for both the timing of factor regimes and the crowding dynamics that erode returns over time.

Why Backtests Decay: Regime Dependence and Crowding

Comments

Want to join the conversation?