TY - GEN
T1 - Improving Health Status Prediction by Applying Appropriate Missing Value Imputation Technique
AU - Tabassum, Shaira
AU - Abedin, Nuren
AU - Maruf, Rafiqul Islam
AU - Taufiq Ahmed, Mostafa
AU - Ahmed, Ashir
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The presence of missing information in health data is a common occurrence, especially in remote healthcare systems. Lack of data in the medical domains reduces the representativeness of the samples, creates biased estimations, and leads to improper conclusions. These missing values need to be handled efficiently by selecting an appropriate imputation technique. This paper aims to find a suitable imputation technique for remote healthcare data. We use our Portable Health Clinic (PHC) dataset which was collected over 12 long years from different locations in Bangladesh and it was found that 20% of data items were missing. We carried out a comparative analysis among eight missing value handling methods by applying these methods to five state-of-the-art machine learning models with PHC Healthcare Dataset. The imputation performance of each case is evaluated based on accuracy and f1-score. The Multiple Imputation by Chained Equations (MICE) imputation has achieved the highest accuracy and f1-score in all of the cases. Thus, this study demonstrates MICE as the best performing missing value imputation technique with any composition of machine learning process and algorithms.
AB - The presence of missing information in health data is a common occurrence, especially in remote healthcare systems. Lack of data in the medical domains reduces the representativeness of the samples, creates biased estimations, and leads to improper conclusions. These missing values need to be handled efficiently by selecting an appropriate imputation technique. This paper aims to find a suitable imputation technique for remote healthcare data. We use our Portable Health Clinic (PHC) dataset which was collected over 12 long years from different locations in Bangladesh and it was found that 20% of data items were missing. We carried out a comparative analysis among eight missing value handling methods by applying these methods to five state-of-the-art machine learning models with PHC Healthcare Dataset. The imputation performance of each case is evaluated based on accuracy and f1-score. The Multiple Imputation by Chained Equations (MICE) imputation has achieved the highest accuracy and f1-score in all of the cases. Thus, this study demonstrates MICE as the best performing missing value imputation technique with any composition of machine learning process and algorithms.
UR - http://www.scopus.com/inward/record.url?scp=85129137382&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129137382&partnerID=8YFLogxK
U2 - 10.1109/LifeTech53646.2022.9754794
DO - 10.1109/LifeTech53646.2022.9754794
M3 - Conference contribution
AN - SCOPUS:85129137382
T3 - LifeTech 2022 - 2022 IEEE 4th Global Conference on Life Sciences and Technologies
SP - 345
EP - 348
BT - LifeTech 2022 - 2022 IEEE 4th Global Conference on Life Sciences and Technologies
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th IEEE Global Conference on Life Sciences and Technologies, LifeTech 2022
Y2 - 7 March 2022 through 9 March 2022
ER -