Development and Validation of a Multimodal Interpretable Machine Learning Model with SHAP for Malignancy Risk Prediction in Bethesda III Thyroid Nodules: A Dual-Center Retrospective Cohort Study
Why It Matters
Accurate, non‑invasive risk assessment for Bethesda III nodules can spare patients unnecessary surgery and guide personalized treatment, addressing a long‑standing diagnostic gap in thyroid cancer care.
Key Takeaways
- •Thyroglobulin and BRAF V600E are top independent predictors
- •Multimodal fusion model reached AUC 0.959 in training set
- •XGBoost showed best cross‑center performance with AUC 0.890
- •Four‑tier risk stratification gives 92% NPV for low‑risk BRAF‑negative nodules
Pulse Analysis
Thyroid nodules classified as Bethesda III present a diagnostic dilemma because cytology alone cannot reliably distinguish benign from malignant lesions. Clinicians have traditionally relied on repeat biopsies, molecular testing, or diagnostic surgery, each carrying cost and patient‑burden implications. Integrating diverse data streams—patient demographics, serum thyroglobulin levels, high‑resolution ultrasound characteristics, and molecular markers like BRAF V600E—offers a more nuanced risk profile, but requires sophisticated analytics to synthesize the information effectively.
The study leveraged six machine‑learning algorithms and constructed hierarchical models that progressively added data modalities. The multimodal fusion model, particularly the XGBoost implementation, delivered near‑perfect discrimination (AUC 0.959) in the development cohort and retained robustness across an external validation set (AUC 0.890). SHapley Additive exPlanations (SHAP) provided transparent insight, highlighting BRAF V600E mutation as the primary driver of risk predictions, thereby satisfying clinicians' demand for explainable AI. Sensitivity analyses confirmed that excluding reclassified NIFTP lesions did not erode predictive power, underscoring the model's stability.
For practice, the four‑tier risk stratification scheme equips physicians with actionable thresholds, especially for BRAF‑negative patients who historically lack clear molecular guidance. The low‑risk tier's 92% negative predictive value can justify observation over surgery, while the high‑risk tier's 100% positive predictive value supports definitive intervention. By delivering precise, interpretable risk estimates, this approach promises to reduce unnecessary procedures, lower healthcare costs, and improve patient outcomes, setting a benchmark for AI‑driven decision support in endocrine oncology.
Development and validation of a multimodal interpretable machine learning model with SHAP for malignancy risk prediction in Bethesda III thyroid nodules: a dual-center retrospective cohort study
Comments
Want to join the conversation?
Loading comments...