A Comparative Analysis of Combination of CNN-Based Models with Ensemble Learning on Imbalanced Data

Xiaoling Gao - Universiti Teknologi MARA, Shah Alam 40450, Selangor, Malaysia
Nursuriati Jamil - Universiti Teknologi MARA, Shah Alam 40450, Selangor, Malaysia
Muhammad Ramli - Universiti Teknologi MARA, Shah Alam 40450, Selangor, Malaysia
Syed Mohd Zahid Syed Zainal Ariffin - Universiti Teknologi MARA, Shah Alam 40450, Selangor, Malaysia


Citation Format:



DOI: http://dx.doi.org/10.62527/joiv.8.1.2194

Abstract


This study investigates the usefulness of the Synthetic Minority Oversampling Technique (SMOTE) in conjunction with convolutional neural network (CNN) models, which include both single and ensemble classifiers. The objective of this research is to handle the difficulty of multi-class imbalanced image classification. The application of SMOTE in imbalanced picture datasets is still underexplored, even though CNNs have been shown to be successful in image classification and that ensemble learning approaches have improved their performance. To investigate whether or not SMOTE can increase classification accuracy and other performance measures when combined with CNN-based classifiers, our research makes use of a CIFAR-10 dataset that has been artificially step-imbalanced and has varying imbalanced ratios. We conducted experiments using five distinct models, namely AdaBoost, XGBoost, standalone CNN, CNN-AdaBoost, and CNN-XGBoost, on datasets that were either imbalanced or SMOTE-balanced. Metrics such as accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC) were included in the evaluation process. The findings indicate that SMOTE dramatically improves the accuracy of minority classes, and that the combination of ensemble classifiers with CNNs and oversampling techniques significantly improves overall classification performance, particularly in situations when there is a high-class imbalance. When it comes to enhancing imbalanced classification tasks, this study demonstrates the potential of merging oversampling techniques with CNN-based ensemble classifiers to minimize the impacts of class imbalance in picture datasets. This suggests a promising direction for future research in this area.

Keywords


Deep learning; Ensemble Learning; SMOTE; Imbalanced data; Image classification.

Full Text:

PDF

References


Yakshit, G. Kaur, V. Kaur, Y. Sharma, and V. Bansal, “Analyzing various Machine Learning Algorithms with SMOTE and ADASYN for Image Classification having Imbalanced Data,” Proceedings of 2022 IEEE International Conference on Current Development in Engineering and Technology, CCET 2022, no. December 2022, 2022, doi: 10.1109/CCET56606.2022.10080783.

Z. Yang, R. O. Sinnott, J. Bailey, and Q. Ke, A survey of automated data augmentation algorithms for deep learning-based image classification tasks, vol. 65, no. 7. Springer London, 2023. doi: 10.1007/s10115-023-01853-2.

A. Ali, S. M. Shamsuddin, and A. L. Ralescu, “Classification with class imbalance problem: A review,” International Journal of Advances in Soft Computing and its Applications, vol. 7, no. 3, pp. 176–204, 2015.

M. Han, A. Li, Z. Gao, D. Mu, and S. Liu, “A survey of multi-class imbalanced data classification methods,” Journal of Intelligent and Fuzzy Systems, vol. 44, no. 2, pp. 2471–2501, 2023, doi: 10.3233/JIFS-221902.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, no. Sept. 28, pp. 321–357, 2002, [Online]. Available: https://arxiv.org/pdf/1106.1813.pdf%0Ahttp://www.snopes.com/horrors/insects/telamonia.asp

G. Kovács, “Smote-variants: A python implementation of 85 minority oversampling techniques,” Neurocomputing, vol. 366, pp. 352–354, 2019, doi: 10.1016/j.neucom.2019.06.100.

H. Han, W. Y. Wang, and B. H. Mao, “Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning,” Lecture Notes in Computer Science, vol. 3644, no. PART I, pp. 878–887, 2005, doi: 10.1007/11538059_91.

C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, “Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5476 LNAI, pp. 475–482, 2009, doi: 10.1007/978-3-642-01307-2_43.

J. Stefanowski and F. Herrera, “SMOTE-IPF : Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering,” no. September, 2014, doi: 10.1016/j.ins.2014.08.051.

E. Ramentol, Y. Caballero, and R. Bello, “SMOTE-RSB ∗ : a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory,” 2011, doi: 10.1007/s10115-011-0465-6.

X. Dong and Z. Yu, “A survey on ensemble learning,” vol. 14, no. 2, pp. 241–258, 2020.

O. Sagi and L. Rokach, “Ensemble learning: A survey,” Wiley Interdiscip Rev Data Min Knowl Discov, vol. 8, no. 4, pp. 1–18, 2018, doi: 10.1002/widm.1249.

M. A. Ganaie, M. Hu, A. K. Malik, M. Tanveer, and P. N. Suganthan, “Ensemble deep learning: A review,” Eng Appl Artif Intell, vol. 115, no. June, p. 105151, 2022, doi: 10.1016/j.engappai.2022.105151.

M. Rahman, R. Prodhan, Y. Shishir, and S. Ripon, “Analyzing and Evaluating Boosting-Based CNN Algorithms for Image Classification,” 2021 International Conference on Intelligent Technologies, CONIT 2021, pp. 1–6, 2021, doi: 10.1109/CONIT51480.2021.9498328.

S. Badirli, X. Liu, Z. Xing, A. Bhowmik, K. Doan, and S. S. Keerthi, “Gradient Boosting Neural Networks: GrowNet,” 2020, [Online]. Available: http://arxiv.org/abs/2002.07971

T. Chen and C. Guestrin, “XGBoost : A Scalable Tree Boosting System,” pp. 785–794, 2016, doi: 10.1145/2939672.2939785.

N. V Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, “LNAI 2838 - SMOTEBoost: Improving Prediction of the Minority Class in Boosting,” 2003.

M. Lv, Y. Ren, and Y. Chen, “Research on imbalanced data : based on SMOTE-adaboost algorithm,” 2019 IEEE 3rd International Conference on Electronic Information Technology and Computer Engineering, EITCE 2019, pp. 1165–1170, 2019, doi: 10.1109/EITCE47263.2019.9094859.

E. Ileberi, Y. Sun, and Z. Wang, “Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost,” IEEE Access, vol. 9, pp. 165286–165294, 2021, doi: 10.1109/ACCESS.2021.3134330.

M. S. Sainin, R. Alfred, and F. Ahmad, “Ensemble Meta Classifier with Sampling and Feature Selection for Data with Multiclass Imbalance Problem Faculty of Computing and Informatics School of Computing , Universiti Utara Malaysia , Malaysia,” vol. 2, no. 2, pp. 103–133, 2021.

Q. Su et al., “Target Intention Recognition Model Based on SMOTE-AdaBoost under Unbalanced Samples,” vol. 6027, pp. 227–231, 2022.

Q. Gao et al., “Identification of Orphan Genes in Unbalanced Datasets Based on Ensemble Learning,” Front Genet, vol. 11, no. October, 2020, doi: 10.3389/fgene.2020.00820.

J. Edward, M. M. Rosli, Y. Chua, N. Alicezah, M. Kasim, and H. Nawawi, “Classification Prediction of Familial Hypercholesterolemia using Ensemble-based Classifier with Feature Selection and Rebalancing Technique,” pp. 278–283, 2022.

R. Panthong, “Combining SMOTE and OVA with Deep Learning and Ensemble Classifiers for Multiclass Imbalanced,” 2022, doi: 10.3844/jcssp.2022.732.742.

X. Jiang, J. Wang, and J. Wang, “An adaptive multi-class imbalanced classification framework based on ensemble methods and deep network,” Neural Comput Appl, vol. 35, no. 15, pp. 11141–11159, 2023, doi: 10.1007/s00521-023-08290-w.

A. A. Khan, O. Chaudhari, and R. Chandra, “A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation,” 2023, [Online]. Available: http://arxiv.org/abs/2304.02858

A. Taherkhani, G. Cosma, and T. M. McGinnity, “AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning,” Neurocomputing, vol. 404, pp. 351–366, 2020, doi: 10.1016/j.neucom.2020.03.064.

Babayomi M, Olagbaju OA, Kadiri AA, “Convolutional XGBoost (C-XGBOOST) Model for Brain Tumor Detection,” Jan.2023, arXiv preprint arXiv:2301.02317.

Q. Lv et al., “Enhanced-Random-Feature-Subspace-Based Ensemble CNN for the Imbalanced Hyperspectral Image Classification,” IEEE J Sel Top Appl Earth Obs Remote Sens, vol. 14, pp. 3988–3999, 2021, doi: 10.1109/JSTARS.2021.3069013.

B. Choudhary, A. Babu, and U. T. B, An Ensemble MultiLabel Classifier for Intra-Cranial Haemorrhage Detection from Large, Heterogeneous and Imbalanced Database, vol. 2. Springer Nature Switzerland. doi: 10.1007/978-3-031-27609-5.

H. Zhao, J. Jin, Y. Liu, Y. Shen, and Y. Jiang, “AdaBoost-MICNN: A new network framework for pulsar candidate selection,” Mon Not R Astron Soc, vol. 513, no. 2, pp. 2869–2883, 2022, doi: 10.1093/mnras/stac619.

U. Chinta and A. Atyabi, “A Framework Pipeline to Address Imbalanced Class Distribution Problem in Real-world Datasets,” 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), pp. 746–753, 2023, doi: 10.1109/CCWC57344.2023.10099163.

H. P. Chou, S. C. Chang, J. Y. Pan, W. Wei, and D. C. Juan, “Remix: Rebalanced Mixup,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12540 LNCS, pp. 95–110, 2020, doi: 10.1007/978-3-030-65414-6_9.

Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106, 249–259. doi: 10.1016/j.neunet.2018.07.011

J. Hao and T. K. Ho, “Machine Learning Made Easy: A Review of Scikit-learn Package in Python Programming Language,” Journal of Educational and Behavioral Statistics, vol. 44, no. 3, pp. 348–361, 2019, doi: 10.3102/1076998619832248.

M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Inf Process Manag, vol. 45, no. 4, pp. 427–437, 2009, doi: 10.1016/j.ipm.2009.03.002.