Hybrid Approach with Distance Feature for Multi-Class Imbalanced Datasets

Hartono Hartono; Erianto Ongko

doi:10.30630/joiv.7.1.1292

Hybrid Approach with Distance Feature for Multi-Class Imbalanced Datasets

Hartono Hartono - Universitas Potensi Utama, Medan, Indonesia
Erianto Ongko - Akademi Teknologi Industri Immanuel, Medan, Indonesia

Citation Format:

DOI: http://dx.doi.org/10.30630/joiv.7.1.1292

Abstract

The multi-class imbalance problem has a higher level of complexity when compared to the binary class problem. The difficulty is due to the large number of classes that will present challenges related to overlapping between classes. Many approaches have been proposed to deal with these multi-class problems. One is a hybrid approach combining a data-level approach and an algorithm-level approach. This approach is done by the ensemble on the classifier and also oversampling on the minority class. SMOTE is an oversampling method that provides good performance, but this method is necessary to determine the best sample used in the interpolation process to generate new samples. The need for determining the best sample is related to the overlap between classes that always accompanies the multi-class imbalance problem. The existence of overlap requires efforts to determine the safe region to synthesize the sample in the oversampling process in SMOTE. The safe region is considered the best for synthesizing samples due to the lower tendency of overlapping. It can be done by constructing distance features to determine the safe region. The sample with the best distance and the lowest imbalance ratio will be selected as a sample in the over-sampling process with SMOTE. The main contribution of this research is the proposed method of Hybrid Approach with Distance Feature so that it can determine safe samples, with the main advantage being in addition to handling multi-class imbalances, it is also better for handling overlapping. The results of this study will be compared with Multiple Random Balance (MultiRandBal) which performs a random oversampling process. The results showed that the Augmented R-Value, Class Average Accuracy, Class Balance Accuracy, and Hamming Loss obtained in this method was better than the random oversampling process. These results also show that the Hybrid Approach with Distance Feature provides better results in handling multi-class imbalances when compared to MultiRandBal.

Keywords

Multi-Class Imbalance; Overlapping; Hybrid Approach; Distance Feature; SMOTE.

Full Text:

PDF

References

S. GarcÃa, Z.-L. Zhang, A. Altalhi, S. Alshomrani, and F. Herrera, â€œDynamic ensemble selection for multi-class imbalanced datasets,â€ Information Sciences, vol. 445â€“446, pp. 22â€“37, Jun. 2018, doi: 10.1016/j.ins.2018.03.002.

M. Temraz and M. T. Keane, â€œSolving the class imbalance problem using a counterfactual method for data augmentation,â€ Machine Learning with Applications, vol. 9, p. 100375, Sep. 2022, doi: 10.1016/j.mlwa.2022.100375.

Y. Zhang, T. Sun, and C. Jiang, â€œBiomacromolecules as carriers in drug delivery and tissue engineering,â€ Acta Pharmaceutica Sinica B, vol. 8, no. 1, pp. 34â€“50, Jan. 2018, doi: 10.1016/j.apsb.2017.11.005.

X. Chao, G. Kou, Y. Peng, and A. FernÃ¡ndez, â€œAn efficiency curve for evaluating imbalanced classifiers considering intrinsic data characteristics: Experimental analysis,â€ Information Sciences, vol. 608, pp. 1131â€“1156, Aug. 2022, doi: 10.1016/j.ins.2022.06.045.

P. Sadhukhan and S. Palit, â€œAdaptive learning of minority class prior to minority oversampling,â€ Pattern Recognition Letters, vol. 136, pp. 16â€“24, Aug. 2020, doi: 10.1016/j.patrec.2020.05.020.

G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, â€œLearning from Class-Imbalanced Data: Review of Methods and Applications,â€ Expert Systems With Applications, vol. 73, pp. 220â€“239, May 2017.

A. Zhang, H. Yu, Z. Huan, X. Yang, S. Zheng, and S. Gao, â€œSMOTE-RkNN: A hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors,â€ Information Sciences, vol. 595, pp. 70â€“88, May 2022, doi: 10.1016/j.ins.2022.02.038.

M. Koziarski, â€œPotential Anchoring for imbalanced data classification,â€ Pattern Recognition, vol. 120, p. 108114, Dec. 2021, doi: 10.1016/j.patcog.2021.108114.

Z. Chen, J. Duan, L. Kang, and G. Qiu, â€œA hybrid data-level ensemble to enable learning from highly imbalanced dataset,â€ Information Sciences, vol. 554, pp. 157â€“176, Apr. 2021, doi: 10.1016/j.ins.2020.12.023.

A. S. Desuky and S. Hussain, â€œAn Improved Hybrid Approach for Handling Class Imbalance Problem,â€ Arab J Sci Eng, vol. 46, no. 4, pp. 3853â€“3864, Apr. 2021, doi: 10.1007/s13369-021-05347-7.

T. Pan, J. Zhao, W. Wu, and J. Yang, â€œLearning imbalanced datasets based on SMOTE and Gaussian distribution,â€ Information Sciences, vol. 512, pp. 1214â€“1233, Feb. 2020, doi: 10.1016/j.ins.2019.10.048.

Q. Li, Y. Song, J. Zhang, and V. S. Sheng, â€œMulti-class imbalanced learning with one-versus-one decomposition and spectral clustering,â€ Expert Systems with Applications, vol. 147, p. 113152, Jun. 2020, doi: 10.1016/j.eswa.2019.113152.

T. R. Hoens, Q. Qian, N. V. Chawla, and Z.-H. Zhou, â€œBuilding Decision Trees for the Multi-class Imbalance Problem,â€ in Advances in Knowledge Discovery and Data Mining, 2012, pp. 122â€“134.

J. A. SÃ¡ez, B. Krawczyk, and M. WoÅºniak, â€œAnalyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets,â€ Pattern Recognition, vol. 57, pp. 164â€“178, Sep. 2016, doi: 10.1016/j.patcog.2016.03.012.

D. Elreedy and A. F. Atiya, â€œA Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance,â€ Information Sciences, vol. 505, pp. 32â€“64, Dec. 2019, doi: 10.1016/j.ins.2019.07.070.

A. Fernandez, S. Garcia, F. Herrera, and N. V. Chawla, â€œSMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,â€ 1, vol. 61, pp. 863â€“905, Apr. 2018.

J. Bi and C. Zhang, â€œAn empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme,â€ Knowledge-Based Systems, vol. 158, pp. 81â€“93, Oct. 2018, doi: 10.1016/j.knosys.2018.05.037.

M. S. Santos, P. H. Abreu, N. Japkowicz, A. FernÃ¡ndez, and J. Santos, â€œA unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research,â€ Information Fusion, vol. 89, pp. 228â€“253, Jan. 2023, doi: 10.1016/j.inffus.2022.08.017.

H. K. Lee and S. B. Kim, â€œAn overlap-sensitive margin classifier for imbalanced and overlapping data,â€ Expert Systems with Applications, vol. 98, pp. 72â€“83, May 2018, doi: 10.1016/j.eswa.2018.01.008.

X. Gao et al., â€œA multi-class classification using one-versus-all approach with the differential partition sampling ensemble,â€ Engineering Applications of Artificial Intelligence, vol. 97, p. 104034, Jan. 2021, doi: 10.1016/j.engappai.2020.104034.

B. Chen, S. Xia, Z. Chen, B. Wang, and G. Wang, â€œRSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise,â€ Information Sciences, vol. 553, pp. 397â€“428, Apr. 2021, doi: 10.1016/j.ins.2020.10.013.

V. P. K. Turlapati and M. R. Prusty, â€œOutlier-SMOTE: A refined oversampling technique for improved detection of COVID-19,â€ Intelligence-Based Medicine, vol. 3â€“4, p. 100023, Dec. 2020, doi: 10.1016/j.ibmed.2020.100023.

K. De Angeli et al., â€œClass imbalance in out-of-distribution datasets: Improving the robustness of the TextCNN for the classification of rare cancer types,â€ Journal of Biomedical Informatics, vol. 125, p. 103957, Jan. 2022, doi: 10.1016/j.jbi.2021.103957.

E. R. Q. Fernandes and A. C. P. L. F. de Carvalho, â€œEvolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning,â€ Information Sciences, vol. 494, pp. 141â€“154, Aug. 2019, doi: 10.1016/j.ins.2019.04.052.

N. K. Mishra and P. K. Singh, â€œFeature construction and smote-based imbalance handling for multi-label learning,â€ Information Sciences, vol. 563, pp. 342â€“357, Jul. 2021, doi: 10.1016/j.ins.2021.03.001.

P. Soltanzadeh and M. Hashemzadeh, â€œRCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem,â€ Information Sciences, vol. 542, pp. 92â€“111, Jan. 2021, doi: 10.1016/j.ins.2020.07.014.

X. Tao et al., â€œSVDD-based weighted oversampling technique for imbalanced and overlapped dataset learning,â€ Information Sciences, vol. 588, pp. 13â€“51, Apr. 2022, doi: 10.1016/j.ins.2021.12.066.

M. Koziarski, M. WoÅºniak, and B. Krawczyk, â€œCombined Cleaning and Re-sampling algorithm for multi-class imbalanced data with label noise,â€ Knowledge-Based Systems, vol. 204, p. 106223, Sep. 2020, doi: 10.1016/j.knosys.2020.106223.

N. Nnamoko and I. Korkontzelos, â€œEfficient treatment of outliers and class imbalance for diabetes prediction,â€ Artificial Intelligence in Medicine, vol. 104, p. 101815, Apr. 2020, doi: 10.1016/j.artmed.2020.101815.

Y. Liu, Y. Liu, B. X. B. Yu, S. Zhong, and Z. Hu, â€œNoise-robust oversampling for imbalanced data classification,â€ Pattern Recognition, vol. 133, p. 109008, Jan. 2023, doi: 10.1016/j.patcog.2022.109008.

J. J. RodrÃguez, J.-F. DÃez-Pastor, Ã. Arnaiz-GonzÃ¡lez, and L. I. Kuncheva, â€œRandom Balance ensembles for multi-class imbalance learning,â€ Knowledge-Based Systems, vol. 193, p. 105434, Apr. 2020, doi: 10.1016/j.knosys.2019.105434.

P. Vuttipittayamongkol and E. Elyan, â€œNeighbourhood-based undersampling approach for handling imbalanced and overlapped data,â€ Information Sciences, vol. 509, pp. 47â€“70, Jan. 2020, doi: 10.1016/j.ins.2019.08.062.

Q. Chen, Z.-L. Zhang, W.-P. Huang, J. Wu, and X.-G. Luo, â€œPF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets,â€ Neurocomputing, vol. 498, pp. 75â€“88, Aug. 2022, doi: 10.1016/j.neucom.2022.05.017.

T. G.s., Y. Hariprasad, S. S. Iyengar, N. R. Sunitha, P. Badrinath, and S. Chennupati, â€œAn extension of Synthetic Minority Oversampling Technique based on Kalman filter for imbalanced datasets,â€ Machine Learning with Applications, vol. 8, p. 100267, Jun. 2022, doi: 10.1016/j.mlwa.2022.100267.

M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, â€œA Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches,â€ IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 4, pp. 463â€“484, Jul. 2012, doi: 10.1109/TSMCC.2011.2161285.

A. Arafa, N. El-Fishawy, M. Badawy, and M. Radad, â€œRN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification,â€ Journal of King Saud University - Computer and Information Sciences, Jun. 2022, doi: 10.1016/j.jksuci.2022.06.005.

F. Charte, A. Rivera, M. J. del Jesus, and F. Herrera, â€œA First Approach to Deal with Imbalance in Multi-label Datasets,â€ in Hybrid Artificial Intelligent Systems, Berlin, Heidelberg, 2013, pp. 150â€“160. doi: 10.1007/978-3-642-40846-5_16.

S. Ruuska, W. HÃ¤mÃ¤lÃ¤inen, S. Kajava, M. Mughal, P. Matilainen, and J. Mononen, â€œEvaluation of the confusion matrix method in the validation of an automated system for measuring feeding behaviour of cattle,â€ Behavioural Processes, vol. 148, pp. 56â€“62, Mar. 2018, doi: 10.1016/j.beproc.2018.01.004.

P. Branco, L. Torgo, and R. P. Ribeiro, â€œRelevance-Based Evaluation Metrics for Multi-class Imbalanced Domains,â€ in Advances in Knowledge Discovery and Data Mining, Cham, 2017, pp. 698â€“710. doi: 10.1007/978-3-319-57454-7_54.

L. Mosley, â€œA balanced approach to the multi-class imbalance problem,â€ Graduate Theses and Dissertations, Jan. 2013, doi: https://doi.org/10.31274/etd-180810-3375.

N. K. Mishra and P. K. Singh, â€œFS-MLC: Feature selection for multi-label classification using clustering in feature space,â€ Information Processing & Management, vol. 57, no. 4, p. 102240, Jul. 2020, doi: 10.1016/j.ipm.2020.102240.

A. Frank and A. Asuncion, â€œUCI Machine Learning Repository.â€ University of California, School of Information and Computer Science, 2010. [Online]. Available: http://archive.ics.uci.edu/ ml

F. Wilcoxon, â€œIndividual Comparisons by Ranking Methods on JSTOR,â€ Biometrics Bulletin, vol. 1, no. 6, pp. 80â€“83, 1945.

Username
Password
Remember me