Feature Minimization for Diabetic Disorders High Performances Prediction System-based on Random Forest Tree

Sahar Mohammed - University of Diyala , Baquba, Diyala, 32001, Iraq
Ali Ahmed - University of Diyala , Baquba, Diyala, 32001, Iraq
Mohammed Mohammed - University of Diyala , Baquba, Diyala, 32001, Iraq


Citation Format:



DOI: http://dx.doi.org/10.30630/joiv.7.3-2.1868

Abstract


Human organ failure due to high blood sugar is considered a chronic disease. Early prediction might reduce or prevent complications due to such disorders, especially with recent machine-learning improvement techniques and the availability of electronic data from different sources. The number of diabetic patients roughly increased and may reach more than 600 million by twenty years. Transforming data into valuable and helpful information is an effort for researchers to improve the performance of ML techniques. This paper applies several types of sampling to predict 1000 samples with attributes and three diabetes class types (Random Forest tree, Hoeffding tree, LWL, NB updatable, and support vector Machine). This paper focused on most parameters that affected overall prediction accuracy. ML performances have been measured depending on the accuracy and mean absolute error for several cross-validation values before Feature reduction and after feature minimization by applying feature selection methods. It shows that Gender, Age, Blood Sugar Level (HbA1c), Triglycerides (TG), and Body Mass Index (BMI) are the most impact attributes applied. It is also shown that the Random Forest tree was the best method (97.7 and 98.6 %) with and without feature minimization, respectively, but it has a higher performance by omitting some unbalanced features from the diabetic dataset. Weight minimization has also been applied to techniques like SVM to obtain a better-searching plane and a robust model. In addition, this study specifies which parameters have weight minimization with the required analysis. Also, the feature selection method was applied to gain memory and reduce time.

Keywords


Decision Tree; Naïve Bayes; Support Vector Machine; Diabetic Disease

Full Text:

PDF

References


J. Li et al., "A tongue features fusion approach to predicting prediabetes and diabetes with machine learning," J. Biomed. Inform., vol. 115, Mar. 2021, doi: 10.1016/j.jbi.2021.103693.

A. Mujumdar and V. Vaidehi, "Diabetes Prediction using Machine Learning Algorithms," in Procedia Computer Science, 2019, vol. 165, pp. 292–299, doi: 10.1016/j.procs.2020.01.047.

S. Shafi and G. Ahmad Ansari, "Early Prediction of Diabetes Disease & Classification of Algorithms Using Machine Learning Approach." [Online]. Available: https://ssrn.com/abstract=3852590.

M. Ravaut et al., "Predicting adverse outcomes due to diabetes complications with machine learning using administrative health data," npj Digit. Med., vol. 4, no. 1, Dec. 2021, doi: 10.1038/s41746-021-00394-8.

L. Kopitar, P. Kocbek, L. Cilar, A. Sheikh, and G. Stiglic, "Early detection of type 2 diabetes mellitus using machine learning-based prediction models," Sci. Rep., vol. 10, no. 1, Dec. 2020, doi: 10.1038/s41598-020-68771-z.

S. Islam Ayon and M. Milon Islam, "Diabetes Prediction: A Deep Learning Approach," Int. J. Inf. Eng. Electron. Bus., vol. 11, no. 2, pp. 21–27, Mar. 2019, doi: 10.5815/ijieeb.2019.02.03.

F. M. Aswad, A. M. S. Ahmed, N. A. M. Alhammadi, B. A. Khalaf, and S. A. Mostafa, "Deep learning in distributed denial-of-service attacks detection method for Internet of Things networks," J. Intell. Syst., vol. 32, no. 1, 2023, doi: 10.1515/jisys-2022-0155.

Chang V, Bailey J, Xu QA, Sun Z. "Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms". Neural Comput Appl. 2022 Mar 24:1-17. doi: 10.1007/s00521-022-07049-z.

P. Aziz, A. Hermawan, and D. Avianto. "Analyze Important Features of PIMA Indian Database for Diabetes Prediction Using KNN." Jurnal Sisfokom (Sistem Informasi dan Komputer) 12.1 (2023): 70-75.

Patra, Radhanath. "Analysis and prediction of Pima Indian Diabetes Dataset using SDKNN classifier technique." IOP Conference Series: Materials Science and Engineering. Vol. 1070. No. 1. IOP Publishing, 2021.

Miao Y. "Using machine learning algorithms to predict diabetes mellitus based on Pima Indians Diabetes dataset". In2021 the 5th International Conference on Virtual and Augmented Reality Simulations 2021 Mar 20 (pp. 47-53).

N. P. Tigga and S. Garg, "Prediction of Type 2 Diabetes using Machine Learning Classification Methods," in Procedia Computer Science, 2020, vol. 167, pp. 706–716, doi: 10.1016/j.procs.2020.03.336.

P. Julian Benadit and F. Sagayaraj Francis, "Improving the performance of a proxy cache using very fast decision tree classifier," in Procedia Computer Science, 2015, vol. 48, no. C, pp. 304–312, doi: 10.1016/j.procs.2015.04.186.

B. Charbuty and A. Abdulazeez, "Classification Based on Decision Tree Algorithm for Machine Learning," J. Appl. Sci. Technol. Trends, vol. 2, no. 01, pp. 20–28, Mar. 2021, doi: 10.38094/jastt20165.

S. R. Safavian and D. Landgrebe, "A survey of decision tree classifier methodology," in IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, no. 3, pp. 660-674, May-June 1991, doi: 10.1109/21.97458.

J. F. Magee, "Decision Making: Decision Trees for Decision Making", Harvard Business Review, 1964.

Zhou H, Zhang J, Zhou Y, Guo X, Ma Y. A feature selection algorithm of decision tree based on feature weight. Expert Systems with Applications. 2021 Feb 1;164:113842.

P. K. and P. S. Arvind Kumar, "A Survey on Hoeffding Tree Stream Data Classification Algorithms," in Proceedings of the National Conference on Recent Innovations in Science and Engineering (RISE-2016, 2015, vol. 1, no. 2, pp. 28–32.

A. Muallem, S. Shetty, J. W. Pan, J. Zhao, and B. Biswal, "Hoeffding Tree Algorithms for Anomaly Detection in Streaming Datasets: A Survey," J. Inf. Secur., vol. 08, no. 04, pp. 339–361, 2017, doi: 10.4236/jis.2017.84022.

H. Hanafi, A. Hendi Muhammad, I. Verawati, and R. Hardi. "An intrusion detection system using sdae to enhance dimensional reduction in machine learning." JOIV: International Journal on Informatics Visualization 6, no. 2 (2022): 306-316.

Sarmini , A. Alhabeeb , M. M. Abusharhah , T. Hariguna , A. R. Hananto, "An Investigation into Indonesian Students' Opinions on Educational Reforms through the Use of Machine Learning and Sentiment Analysis "JOIV : International Journal on Informatics Visualization 6, no. 3 (2022), PP: 604-609.

E. Bahmani, J. Mojtaba, and S. Abdusalam. "Breast cancer prediction using a hybrid data mining model." JOIV: International Journal on Informatics Visualization 3.4 (2019): 327-331.

S. Farhana, "Classification of Academic Performance for University Research Evaluation by Implementing Modified Naive Bayes Algorithm," in Procedia Computer Science, 2021, vol. 194, pp. 224–228, doi: 10.1016/j.procs.2021.10.077.

H. Yoshikawa, "Can naive Bayes classifier predict infection in a close contact of COVID-19? A comparative test for predictability of the predictive model and healthcare workers in Japan: Infection Prediction in a Close Contact of COVID-19," J. Infect. Chemother., vol. 28, no. 6, pp. 774–779, Jun. 2022, doi: 10.1016/j.jiac.2022.02.017.

D. Keerthana, V. Venugopal, M. K. Nath, and M. Mishra, "Hybrid convolutional neural networks with SVM classifier for classification of skin cancer," Biomed. Eng. Adv., vol. 5, p. 100069, Jun. 2023, doi: 10.1016/j.bea.2022.100069.

F. J. Shaikh and D. S. Rao, "Prediction of Cancer Disease using Machine Learning Approach," in Materials Today: Proceedings, 2021, vol. 50, pp. 40–47, doi: 10.1016/j.matpr.2021.03.625.

Mohanarathinam, A., et al. "Diabetic Retinopathy Detection and Classification using Hybrid Multiclass SVM classifier and Deeplearning techniques." Mathematical Statistician and Engineering Applications 71.3 (2022): 891-903.

Khan, Asfandyar, et al. "Cardiovascular and Diabetes Diseases Classification Using Ensemble Stacking Classifiers with SVM as a Meta Classifier." Diagnostics 12.11 (2022): 2595.

Patil R, Tamane S, Rawandale SA, Patil K. A modified mayfly-SVM approach for early detection of type 2 diabetes mellitus. Int. J. Electr. Comput. Eng. 2022 Feb 1;12(1):524-33.

M. Shaheen, N. Naheed, and A. Ahsan, "Relevance-diversity algorithm for feature selection and modified Bayes for prediction," Alexandria Eng. J., Mar. 2022, doi: 10.1016/j.aej.2022.11.002.

Yab, L. Y., Wahid, N., & Hamid, R. A., Inversed Control Parameter in Whale Optimization Algorithm and Grey Wolf Optimizer for Wrapper-based Feature Selection: A comparative study. JOIV: International Journal on Informatics Visualization, 2023, 7(2), 477-486.

D. H. Jeong, B. K. Jeong, N. Leslie, C. Kamhoua, and S.-Y. Ji, "Designing a supervised feature selection technique for mixed attribute data analysis," Mach. Learn. with Appl., vol. 10, p. 100431, Dec. 2022, doi: 10.1016/j.mlwa.2022.100431.

M. T. Kurniawan, S. Yazid, and Y. G. Sucahyo. "Comparison of Feature Selection Methods for DDoS Attacks on Software Defined Networks using Filter-Based, Wrapper-Based and Embedded-Based." JOIV: International Journal on Informatics Visualization 6.4 (2022): 809-814.

M. A. H. Azmi, C. F. M. Foozy, K. A. M. Sukri, N. A. Abdullah, I. Rahmi, A. Hamid, H. Amnar,. "Feature Selection Approach to Detect DDoS Attack Using Machine Learning Algorithms." JOIV: International Journal on Informatics Visualization 5.4 (2021): 395-401.

M. S. M. and S. J. M. S. KURNAZ, "A High Efficiency Thyroid Disorders Prediction System with Non-Dominated Sorting Genetic Algorithm NSGA-II as a Feature Selection Algorithm," in 2020 International Conference for Emerging Technology (INCET), 2020, pp. 1–6, doi: 10.1109/INCET49848.2020.9154189.

S. J. M. and M. S. Mohammed, "COVID-19 risk factors specification using Decision Tree based on the degree of redundancy between features," in 022 IEEE 3rd Global Conference for Advancement in Technology (GCAT), 2022, pp. 1–11, doi: 10.1109/GCAT55367.2022.9971950.

D. Fahrudy, & S. 'Uyun "Classification of Student Graduation using Naïve Bayes by Comparing between Random Oversampling and Feature Selections of Information Gain and Forward Selection," JOIV : International Journal on Informatics Visualization, vol. 6, no. 4, , pp. 798-808, Dec. 2022.

M. Azmi, C. Foozy, K. Sukri, N. Abdullah, I. Hamid, & H. Amnur "Feature Selection Approach to Detect DDoS Attack Using Machine Learning Algorithms," JOIV : International Journal on Informatics Visualization, vol. 5, no. 4, , pp. 395-401, Dec. 2021.

Y. Nataliani "Feature-reduction Fuzzy c-means Clustering for Basketball Players Positioning," JOIV : International Journal on Informatics Visualization, vol. 5, no. 4, , pp. 415-421, Dec. 2021.

Minarno, A. E., Mandiri, M. H. C., Azhar, Y., Bimantoro, F., Nugroho, H. A., & Ibrahim, Z. (2022). Classification of Diabetic Retinopathy Disease Using Convolutional Neural Network. JOIV: International Journal on Informatics Visualization, 6(1), 12-18.

Toresa, D., Shahril, M. A. E., Harun, N. H., Bakar, J. A., & Amnur, H. (2021). Automated Detection and Counting of Hard Exudates for Diabetic Retinopathy by using Watershed and Double Top-Bottom Hat Filtering Algorithm. JOIV: International Journal on Informatics Visualization, 5(3), 242-247.

A. Rashid, "Diabetes Dataset," Mendeley Data, 2020. https://data.mendeley.com/datasets/wj9rwkp9c2/1.