Development and Validation of an Explainable Machine Learning-Based Risk Prediction Model for Obesity in Chinese Children and Adolescents: A Population-Based Study

Development and Validation of an Explainable Machine Learning-Based Risk Prediction Model for Obesity in Chinese Children and Adolescents: A Population-Based Study

Frontiers in Nutrition
Frontiers in NutritionMay 6, 2026

Why It Matters

The tool offers a highly accurate, interpretable method for early obesity detection, allowing schools and health agencies to target interventions and curb the rising tide of childhood obesity.

Key Takeaways

  • Random forest model reached 0.946 AUC on test set
  • Parental BMI and weekday screen time are top risk factors
  • Model validated on 2020 data, maintaining 0.810 AUC
  • Web calculator allows instant obesity risk assessment for families
  • Study uses 35,000 participants across 31 Chinese regions

Pulse Analysis

Childhood obesity has surged worldwide, and China’s rapid urbanization has amplified the problem, with prevalence rates now exceeding 20% among adolescents. Early identification of at‑risk youth is essential for preventing long‑term health complications such as type‑2 diabetes and cardiovascular disease. Traditional risk scores often rely on complex clinical measurements and lack transparency, limiting their adoption in schools or community settings. By leveraging explainable machine‑learning techniques, researchers can combine large‑scale survey data with intuitive visual explanations, bridging the gap between statistical accuracy and practical usability.

The research team assembled a nationally representative cohort of 35,016 children and adolescents from 31 provinces in the 2017‑2018 Physical Activity and Fitness in China study, later testing the model on a 2020 sample of 3,495 participants. After narrowing 38 candidate variables with LASSO and recursive feature elimination, eight algorithms were compared, and the random‑forest classifier emerged as the clear winner, delivering a 0.946 area‑under‑curve on the hold‑out set and 0.810 on temporal validation. SHAP analysis highlighted parental body‑mass index, weekday mobile device use, moderate‑to‑vigorous physical activity, television viewing, and sex as the strongest predictors.

The deployment of a web‑based risk calculator translates these findings into an actionable tool for parents, educators, and public‑health officials, enabling rapid, on‑the‑spot screening without laboratory tests. Such a scalable solution can inform targeted nutrition and activity programs, prioritize resources for high‑risk neighborhoods, and support longitudinal monitoring of intervention outcomes. Moreover, the study demonstrates that explainable AI can meet the dual demands of predictive performance and interpretability, setting a precedent for similar health‑risk models in other chronic‑disease domains across emerging economies.

Development and validation of an explainable machine learning-based risk prediction model for obesity in Chinese children and adolescents: a population-based study

Comments

Want to join the conversation?

Loading comments...