
LLM System Design Interview #36 - The Isomorphic MLP Trick

Key Takeaways
- •SwiGLU adds a third weight matrix, increasing FFN parameters by 50%
- •Keeping 4× hidden dimension inflates memory and FLOP budget on large clusters
- •Reduce expansion factor to 2/3 (≈2.66×) to match original parameter count
- •LLaMA and Mistral use non‑power‑of‑two dimensions for parameter parity
- •Mis‑scaled FFNs can waste millions of dollars on H100 GPU farms
Pulse Analysis
Switching a transformer’s feed‑forward network from ReLU to SwiGLU sounds straightforward, but the gating mechanism adds a third linear projection. That extra matrix means the classic 4 × d_model hidden size now carries 1.5 × more parameters, translating into a proportional rise in memory consumption and floating‑point operations. On clusters of thousands of H100 GPUs, the hidden cost can quickly climb into the millions, making the seemingly minor design choice a costly oversight.
The remedy is simple arithmetic: a gated FFN requires three weight matrices versus two for ReLU, so the hidden dimension must be scaled down by two‑thirds. Multiplying the standard 4× expansion by 2/3 yields an 8/3 × d_model (≈2.66×) hidden size that restores parameter parity. This is why cutting‑edge models such as LLaMA and Mistral adopt intermediate dimensions that appear unconventional— they are deliberately chosen to keep the total parameter count consistent while leveraging SwiGLU’s performance benefits.
For AI engineers and interviewees, the lesson extends beyond a single trick. Maintaining strict parameter budgets ensures fair ablation studies, avoids unnecessary VRAM pressure, and protects organizations from hidden compute expenses. When designing or refactoring large language models, always recalculate the expansion factor whenever a new activation introduces additional weight matrices. Doing so safeguards both research integrity and the bottom line in today’s compute‑intensive AI landscape.
LLM System Design Interview #36 - The Isomorphic MLP Trick
Comments
Want to join the conversation?