Homepage > Research abstracts > A smart system for autism risk assessment: Predictive model based on machine learning and big data
A smart system for autism risk assessment: Predictive model based on machine learning and big data
Researchers: Ayelet Ben-Sasson1, Lidia Gabis2,3
- University of Haifa
- Maccabi HealthCare Services
- University of Tel Aviv
Background: Early autism spectrum disorder (ASD) detection is critical for maximizing the benefits of early intervention. However, community providers are limited in resources for screening and lack ASD-specific expertise. Machine Learning (ML) methods offer an opportunity to utilize population-based routine surveillance records for predicting an infant’s ASD risk level.
Objectives: (1) Build an ML model to predict ASD diagnosis from an infant’s EHR generated by a national screening program and test its accuracy.
(2) Characterize the most important predictors of ASD.
(2) Characterize the most important predictors of ASD.
Method: Database included EHR of 780610 children, of which 1163 had an ASD diagnosis. EHR consisted of data from routine screening visits: birth parameters, growth measurements, developmental progress, and post-natal variables. Model features were limited to measurements from the first 2 years. Gradient boosting ML model with a 3-fold cross validation was applied with the Shapley Additive explanation (SHAP) tool quantifying feature importance.
Findings: Initial model comprised of 100 features which were reduced in an iterative process leading to a final model with an average AUC=0.86, SD<.002. The detected high-risk group had an ASD incidence rate of 0.60, a 600-fold increase of the incidence from the entire cohort (0.001). Of the top 20 important features across folds, 15 overlapped: failing six milestones during the second year of life from language, social and fine motor domains, male gender, parental concern for development, not nursing, older mother age, lower week of gestation, four higher growth indices.
Conclusions: ML methods can utilize preventative care data to estimate ASD risk level under the age of 2 years considering non-linear relations between birth, growth, post-natal, and developmental parameters.
Recommendations: Train providers to detect predictive milestones along with improving the sensitivity of established ASD markers that did not enter the model. Add to routine baby wellness visits an ASD-specific screening tool.
Research number: R/351/2020
Research end date: 04/2023
