Handling Imbalanced Data for Acute Coronary Syndrome Classification Based on Ensemble and K-Means SMOTE Method

Muhammad Faris Muzakki; Rizal Dwi Prayogo; M Afif Rizky A

doi:10.30630/joiv.7.3-2.1429

Handling Imbalanced Data for Acute Coronary Syndrome Classification Based on Ensemble and K-Means SMOTE Method

Muhammad Muzakki - Institut Teknologi Bandung, Bandung 40132, Indonesia
Rizal Dwi Prayogo - Institut Teknologi Bandung, Bandung 40132, Indonesia
M Afif Rizky A - Institut Teknologi Bandung, Bandung 40132, Indonesia

Citation Format:

DOI: http://dx.doi.org/10.30630/joiv.7.3-2.1429

Abstract

Acute Coronary Syndrome (ACS) is a disease that has a high mortality rate with a mortality percentage of 40% after 5 years from diagnosis. Despite the high mortality rate, the conventional process of overestimating ACS can be life-threatening. For this reason, several alternatives for prediagnosis have been investigated to reduce the detection of ACS intensively, one of which is by using a machine learning approach. The machine learning-based prediagnosis approach utilizes patient medical record data as input for making detection models. This approach can produce an optimal model when there is quite a lot of data and the labels have a fairly balanced comparison. However, in machine learning-based ACS detection studies, researchers often do not have balanced data between positive and negative labels that have the potential to cause overfitting. That problem occurs because obtaining additional data with specific labels is difficult. To solve the imbalanced problem in ACS detection, we generated synthetic ACS data using the K-Means SMOTE method. The synthesis data is used as training data to build an ensemble-based machine-learning model. In this study, we obtain an increase in the F1 score of more than 10% when compared to machine learning models that do not use the K-Means SMOTE as an oversampling process. In addition to the greater F1 score, the results obtained are relatively more resistant to overfitting because the data variations in the training set are more diverse.

Keywords

Acute Coronary Syndrome, imbalance learning, k-Means SMOTE

Full Text:

PDF

References

E. A. Amsterdam et al., â€œ2014 AHA/ACC guideline for the management of patients with non-ST-elevation acute coronary syndromes: executive summary: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines,â€ Circulation, vol. 130, no. 25, pp. 2354â€“2394, 2014.

P. Libby, G. Pasterkamp, F. Crea, and I. K. Jang, â€œReassessing the Mechanisms of Acute Coronary Syndromes: The â€˜vulnerable Plaqueâ€™ and Superficial Erosion,â€ Circulation Research, vol. 124, no. 1, pp. 150â€“160, 2019.

N. Makki, T. M. Brennan, and S. Girotra, â€œAcute coronary syndrome,â€ Journal of Intensive Care Medicine, vol. 30, no. 4, pp. 186â€“200, 2015.

E. A. Dziedzic, J. S. GasiorÄ…, A. Tuzimek, M. DabrowskiÄ…, and P. Jankowski, â€œNeutrophil-to-Lymphocyte Ratio Is Not Associated with Severity of Coronary Artery Disease and Is Not Correlated with Vitamin D Level in Patients with a History of an Acute Coronary Syndrome,â€ Biology, vol. 11, no. 7, pp. 1â€“12, 2022.

P. A. Iannattone, X. Zhao, J. VanHouten, A. Garg, and T. Huynh, â€œArtificial Intelligence for Diagnosis of Acute Coronary Syndromes: A Meta-analysis of Machine Learning Approaches,â€ Canadian Journal of Cardiology, vol. 36, no. 4, pp. 577â€“583, 2020.

M. F. Muzakki, J. A. Utama, R. Priyatikanto, and L. S. Riza, â€œDetection System of Solar Flare Occurrence in PROBA2 SWAP Images Using Seeded Region Growing and Machine Learning,â€ vol. 62, no. 07, pp. 3329â€“3342, 2020.

W. G. Baxt, F. S. Shofer, F. D. Sites, and J. E. Hollander, â€œA neural network aid for the early diagnosis of cardiac ischemia in patients presenting to the emergency department with chest pain,â€ Annals of Emergency Medicine, vol. 40, no. 6, pp. 575â€“583, 2002.

A. M. Bulgiba and M. Razaz, â€œHow well can signs and symptoms predict AMI in the Malaysian population?,â€ International Journal of Cardiology, vol. 102, no. 1, pp. 87â€“93, 2005.

E. P. Cynthia, M. Afif Rizky A., A. Nazir, and F. Syafria, â€œRandom Forest Algorithm to Investigate the Case of Acute Coronary Syndrome,â€ Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 2, pp. 369â€“378, 2021.

S. Calderon-ramirez et al., â€œCorrecting data imbalance for semi-supervised COVID-19 detection using X-ray chest images,â€ Applied Soft Computing, vol. 111, p. 107692, 2021.

V. Karia, W. Zhang, A. Naeim, and R. Ramezani, â€œGensample: A genetic algorithm for oversampling in imbalanced datasets,â€ arXiv preprint arXiv:1910.10806, 2019.

R. Mohammed, J. Rawashdeh, and M. Abdullah, â€œMachine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,â€ 2020 11th International Conference on Information and Communication Systems, ICICS 2020, pp. 243â€“248, 2020.

F. Last, G. Douzas, and F. Bacao, â€œOversampling for Imbalanced Learning Based on K-Means and SMOTE,â€ pp. 1â€“19, 2017.

X. W. Liang, A. P. Jiang, T. Li, Y. Y. Xue, and G. T. Wang, â€œLR-SMOTEâ€”An improved unbalanced data set oversampling based on K-means and SVM,â€ Knowledge-Based Systems, vol. 196, p. 105845, 2020.

Q. Wang, L. Li, B. Jiang, Z. Lu, J. Liu, and S. Jian, â€œMalicious domain detection based on k-means and smote,â€ in International Conference on Computational Science, 2020, pp. 468â€“481.

N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer, â€œSMOTE: Synthetic Minority Over-sampling Technique,â€ Journal of Artiï¬cial Intelligence, vol. 16, pp. 321â€“357, 2002.

L. Breiman, â€œRandom forests,â€ Machine learning, vol. 45, no. 1, pp. 5â€“32, 2001.

K. Zhang, X. Wu, R. Niu, K. Yang, and L. Zhao, â€œThe assessment of landslide susceptibility mapping using random forest and decision tree methods in the Three Gorges Reservoir area, China,â€ Environmental Earth Sciences, vol. 76, no. 11, 2017.

R. G. Leiva, A. F. Anta, V. Mancuso, and P. Casari, â€œA novel hyperparameter-free approach to decision tree construction that avoids overfitting by design,â€ IEEE Access, vol. 7, pp. 99978â€“99987, 2019.

M. Tschannen, O. Bachem, and M. Lucic, â€œRecent Advances in Autoencoder-Based Representation Learning,â€ no. NeurIPS, pp. 1â€“25, 2018.

W. Xu and Y. Tan, â€œSemisupervised Text Classification by Variational Autoencoder,â€ IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 1, pp. 295â€“308, 2020.

S. Nembrini, I. R. KÃ¶nig, and M. N. Wright, â€œThe revival of the Gini importance?,â€ Bioinformatics, vol. 34, no. 21, pp. 3711â€“3718, 2018.

H. He, Y. Bai, E. A. Garcia, and S. Li, â€œADASYN: Adaptive synthetic sampling approach for imbalanced learning,â€ in 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), 2008, pp. 1322â€“1328.

H. Lee, J. Kim, and S. Kim, â€œGaussian-based SMOTE algorithm for solving skewed class distributions,â€ International Journal of Fuzzy Logic and Intelligent Systems, vol. 17, no. 4, pp. 229â€“234, 2017.

L. Ma and S. Fan, â€œCURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests,â€ BMC bioinformatics, vol. 18, no. 1, pp. 1â€“18, 2017.

F. R. Torres, J. A. Carrasco-Ochoa, and J. F. Martâ€™inez-Trinidad, â€œSMOTE-D a deterministic version of SMOTE,â€ in Mexican Conference on Pattern Recognition, 2016, pp. 177â€“188.

H. Han, W.-Y. Wang, and B.-H. Mao, â€œBorderline-SMOTE: a new over-sampling method in imbalanced data sets learning,â€ in international conference on intelligent computing, pp. 878â€“887, 2005.

S. H. Ha and S. H. Joo, â€œA hybrid data mining method for the medical classification of chest pain,â€ International Journal of Computer and Information Engineering, vol. 4, no. 1, pp. 99â€“104, 2010.

G. B. Berikol, O. Yildiz, and T. Ã–zcan, â€œDiagnosis of Acute Coronary Syndrome with a Support Vector Machine,â€ Journal of Medical Systems, vol. 40, no. 4, pp. 1â€“8, 2016.

R. D. Prayogo and S. A. Karimah, â€œFeature Selection and Adaptive Synthetic Sampling Approach for Optimizing Online Shopper Purchase Intent Prediction,â€ 2021.

M. P. Perme and D. Manevski, â€œConfidence intervals for the Mannâ€“Whitney test,â€ Statistical Methods in Medical Research, vol. 28, no. 12, pp. 3755â€“3768, 2019.

Username
Password
Remember me