TY - GEN
T1 - A data enhancement approach to improve machine learning performance for predicting health status using remote healthcare data
AU - Tabassum, Shaira
AU - Sampa, Masuda Begum
AU - Islam, Rafiqul
AU - Yokota, Fumihiko
AU - Nakashima, Naoki
AU - Ahmed, Ashir
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11/28
Y1 - 2020/11/28
N2 - Machine Learning (ML) is becoming tremendously important to improve the performance of remote healthcare systems. Portable health clinic (PHC), a remote healthcare system contains a triage function that classifies the patients in two major groups - (a)healthy and (b)unhealthy. Unhealthy patients require regular health checkups. This paper aims to predict the status of the registered patients to decide the follow-up date and frequency. Health management cost can be reduced by decreasing the number of follow-up frequency. We carried out an experiment on 271 corporate members and monitored their health status in every three months and collected four phases of data. The data records contain clinical data, socio-demographical data, dietary behavior data. However, most of the machine learning algorithms can not directly work with categorical data. Several encoding techniques are available which can also enhance the prediction performance. In this paper, We applied three encoding techniques and proposed a new encoding approach to handle categorical variables. The result shows that Random Forest Classifier performs the best with 95.33% accuracy. A comparison chart displaying the performance of eight different supervised learning algorithms in terms of three existing encoding mechanisms is reported.
AB - Machine Learning (ML) is becoming tremendously important to improve the performance of remote healthcare systems. Portable health clinic (PHC), a remote healthcare system contains a triage function that classifies the patients in two major groups - (a)healthy and (b)unhealthy. Unhealthy patients require regular health checkups. This paper aims to predict the status of the registered patients to decide the follow-up date and frequency. Health management cost can be reduced by decreasing the number of follow-up frequency. We carried out an experiment on 271 corporate members and monitored their health status in every three months and collected four phases of data. The data records contain clinical data, socio-demographical data, dietary behavior data. However, most of the machine learning algorithms can not directly work with categorical data. Several encoding techniques are available which can also enhance the prediction performance. In this paper, We applied three encoding techniques and proposed a new encoding approach to handle categorical variables. The result shows that Random Forest Classifier performs the best with 95.33% accuracy. A comparison chart displaying the performance of eight different supervised learning algorithms in terms of three existing encoding mechanisms is reported.
UR - http://www.scopus.com/inward/record.url?scp=85101003434&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101003434&partnerID=8YFLogxK
U2 - 10.1109/ICAICT51780.2020.9333506
DO - 10.1109/ICAICT51780.2020.9333506
M3 - Conference contribution
AN - SCOPUS:85101003434
T3 - 2020 2nd International Conference on Advanced Information and Communication Technology, ICAICT 2020
SP - 308
EP - 312
BT - 2020 2nd International Conference on Advanced Information and Communication Technology, ICAICT 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd International Conference on Advanced Information and Communication Technology, ICAICT 2020
Y2 - 28 November 2020 through 29 November 2020
ER -