Feature Selection Leads to Divergent Neurobiological Interpretations of Brain-Based Machine Learning Biomarkers

Feature Selection Leads to Divergent Neurobiological Interpretations of Brain-Based Machine Learning Biomarkers

Nature Human Behaviour
Nature Human BehaviourApr 15, 2026

Why It Matters

The work challenges the default interpretive framework for neuroimaging biomarkers, urging researchers to consider broader feature sets to avoid misleading conclusions and to improve precision‑medicine applications.

Key Takeaways

  • Univariate feature selection omits many predictive brain connections.
  • Mid‑ranked deciles achieve comparable prediction accuracy to top‑decile.
  • Overlooked feature sets generalize across independent neuroimaging cohorts.
  • Different deciles highlight distinct functional network contributions.
  • Ridge regression without selection underperforms but still captures meaningful signals.

Pulse Analysis

Machine‑learning models built on neuroimaging data have become a cornerstone for linking brain architecture to cognition, development, and mental health. Traditionally, researchers prune the massive feature space by applying univariate filters that retain only the strongest brain‑behaviour correlations. While this reduces computational load and simplifies model interpretation, it assumes that the most statistically prominent edges fully capture the underlying biology. In reality, brain networks are highly distributed, and weaker connections can collectively encode substantial information about individual differences.

The Yale‑led investigation introduced a decile‑based framework that systematically evaluates non‑overlapping subsets of connectivity features, from the top 10 % down to the bottom 10 %. Across four large developmental cohorts totaling more than 12,000 participants, models trained on mid‑ranked and even low‑ranked deciles reliably predicted executive function, language ability, age, sex, and several psychiatric measures. Crucially, external validation demonstrated that these alternative feature sets generalized as well as the conventional top‑decile models, yet each decile highlighted a unique constellation of functional networks, underscoring the multiplicity of plausible neurobiological explanations.

These results have immediate implications for the design and interpretation of brain‑based biomarkers. Researchers should move beyond a single‑feature‑set narrative and acknowledge that multiple, equally predictive models may coexist, each offering distinct mechanistic insights. This broader perspective can enhance the robustness of precision‑medicine strategies, improve reproducibility, and encourage the development of analytic pipelines that preserve the full richness of neuroimaging data. Future work will likely explore ensemble approaches that integrate diverse decile models, fostering a more nuanced understanding of how distributed brain circuits drive behaviour and disease.

Feature selection leads to divergent neurobiological interpretations of brain-based machine learning biomarkers

Comments

Want to join the conversation?

Loading comments...