
Machine Learning System Design Interview #47 - The EWC Rigidity Trap

Key Takeaways
- •EWC over-penalizes weight changes, freezing essential features
- •Aggressive λ dominates gradients, halting new domain learning
- •Capacity lockout blocks old feature extractors needed for new tasks
- •Decay EWC penalty dynamically to restore plasticity without forgetting
- •Inject LoRA adapters or replay buffers for isolated representational capacity
Pulse Analysis
Elastic Weight Consolidation (EWC) has become a go‑to technique for mitigating catastrophic forgetting in continual‑learning systems. By estimating the Fisher Information Matrix, EWC identifies weights that are crucial for previously learned tasks and adds a quadratic penalty that resists their modification. While this stabilizes historic performance, it also intensifies the classic stability‑plasticity dilemma: the model becomes overly rigid, unable to incorporate new information. Understanding when the regularization term outweighs the learning signal is essential for senior ML engineers tasked with deploying adaptive models at scale.
In practice, an aggressive EWC regularization multiplier (λ) can dominate the loss gradient, a phenomenon the interview at DeepMind highlights as the “rigidity trap.” The penalty gradients swamp the gradients derived from fresh data, leading to a capacity lockout where the same weights that encode core features for the old domain are also needed to map the new domain. As a result, training stalls despite ample data and epochs, and simple fixes like raising the learning rate merely mask the underlying gradient imbalance.
The remedy lies in rebalancing stability and plasticity rather than discarding EWC altogether. Practitioners can decay λ over time, allowing the model to gradually relax constraints as it assimilates new tasks. Complementary strategies such as inserting low‑rank adapters (LoRA) or maintaining a sparse experience‑replay buffer provide isolated capacity that learns without perturbing the protected backbone. This hybrid approach preserves historic accuracy while unlocking adaptation, a tactic senior ML engineers can cite confidently in interviews and apply to production‑grade continual‑learning pipelines. It also reduces compute overhead compared with retraining from scratch.
Machine Learning System Design Interview #47 - The EWC Rigidity Trap
Comments
Want to join the conversation?