Hypertension Prediction in Adolescents Using Anthropometric Measurements: Do Machine Learning Models Perform Equally Well?

Chai, Soo See and Goh, Kok Luong and Cheah, Whye Lian and Chang, Robin Yee Hui and Ng, Giap Weng (2022) Hypertension Prediction in Adolescents Using Anthropometric Measurements: Do Machine Learning Models Perform Equally Well? Applied Sciences, 12 (1600). pp. 1-17. ISSN 2076-3417

[img] PDF
Hypertension Prediction in Adolescents.pdf

Download (176kB)
Official URL: https://www.mdpi.com/journal/applsci

Abstract

The use of anthropometric measurements in machine learning algorithms for hypertension prediction enables the development of simple, non-invasive prediction models. However, different machine learning algorithms were utilized in conjunction with various anthropometric data, either alone or in combination with other biophysical and lifestyle variables. It is essential to assess the impacts of the chosen machine learning models using simple anthropometric measurements. We developed and tested 13 machine learning methods of neural network, ensemble, and classical categories to predict hypertension in adolescents using only simple anthropometric measurements. The imbalanced dataset of 2461 samples with 30.1% hypertension subjects was first partitioned into 90% for training and 10% for validation. The training dataset was reduced to eight simple anthropometric measurements: age, C index, ethnicity, gender, height, location, parental hypertension, and waist circumference using correlation coefficient. The Synthetic Minority Oversampling Technique (SMOTE) combined with random under-sampling was used to balance the dataset. The models with optimal hyperparameters were assessed using accuracy, precision, sensitivity, specificity, F1-score, misclassification rate, and AUC on the testing dataset. Across all seven performance measures, no model consistently outperformed the others. LightGBM was the best model for all six performance metrics, except sensitivity, whereas Decision Tree was the worst. We proposed using Bayes’ Theorem to assess the models’ applicability in the Sarawak adolescent population, resulting in the top four models being LightGBM, Random Forest, XGBoost, and CatBoost, and the bottom four models being Logistic Regression, LogitBoost, SVM, and Decision Tree. This study demonstrates that the choice of machine learning models has an effect on the prediction outcomes.

Item Type: Article
Additional Information: Information, Communication and Creative Technology
Uncontrolled Keywords: adolescents; anthropometric; hypertension; imbalanced dataset; machine learning prediction; SMOTE.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
R Medicine > RA Public aspects of medicine > RA0421 Public health. Hygiene. Preventive Medicine
Divisions: Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Faculties, Institutes, Centres > Faculty of Computer Science and Information Technology
Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Depositing User: See
Date Deposited: 15 Feb 2022 06:57
Last Modified: 07 Sep 2022 01:56
URI: http://ir.unimas.my/id/eprint/37912

Actions (For repository members only: login required)

View Item View Item