ON

— Unbalanced conditions in the dataset often become a real-world problem, especially in machine learning. Class imbalance in the dataset is a condition where the number of minority classes is much smaller than the majority class, or the number is insufficient. Machine learning models tend to recognize patterns in the majority class more than in the minority class. This problem is one of the most critical challenges in machine learning research, so several methods have been developed to overcome it. However, most of these methods only focus on binary datasets, so few methods still focus on multiclass datasets. Handling unbalanced multiclass is more complex than handling unbalanced binary because it involves more classes than binary class datasets. With these problems, we need an algorithm with features that can support adjustments to the difficulties that arise in multiclass unbalanced datasets. One of the algorithms that have features for adjustment is the ensemble algorithm, namely Xtreme Gradient Boosting. Based on the research, our proposed method with Xtreme Gradient Boosting showed better results than the other classification and ensemble algorithms on eight datasets with five evaluation metrics indicators such as balanced accuracy, the geometric-mean, multiclass area under the curve, true positive rate, and true negative rate. In future research, we suggest combining methods at the data level and Xtreme Gradient Boosting. With the performance increase in Xtreme Gradient Boosting, it can be a solution and reference in the case of handling multiclass imbalanced problems. Besides, we also recommended testing with datasets in the form of categorical and continuous data.


I. INTRODUCTION
Class imbalance in the dataset is a situation or condition where the value of the minority class is much smaller than that of the majority class or so inadequate that the model recognizes patterns in the majority class more than the minority class.For example, in the medical world, the amount of data from patients with diabetes is less than from patients who do not have diabetes.This problem can result in an inaccurate classification and be fatal if implemented in the real world [1]- [3].The problem of imbalanced datasets is one of the most critical challenges in the machine learning research community.Various methods have been developed to overcome these problems, such as resampling methods, cost-sensitive approaches, ensemble learning algorithms, kernel-based methods, and active learning methods.These techniques can be categorized into several approaches based on how to overcome the problem of imbalanced data.The first approach at the algorithm level is to create or modify an algorithm to consider the significance of the positive class.
Algorithm-based approaches include cost-sensitive methods and recognition-based approaches [4], [5].The second approach, namely the data level, is carried out at the pre-data processing stage.The class distribution in the data is rebalanced to reduce the effect of the number of majority classes being too dominant in the learning process.One alternative to improving the model's performance in the imbalance class is to use the ensemble method.The ensemble method, in principle, combines a set of classifiers trained to create a mixed classification model with good performance so that the classifier formed is more accurate than the original classifier in performing a classification.Boosting, Stacking, and Bagging are the most popular ensemble methods used [6]- [10].
The majority of research conducted focused on binary imbalanced datasets.The multiclass imbalance learning problem is much more challenging than the binary scenario because the decision boundary involves distinguishing between more classes [2], [11].Unfortunately, directly applying the proposed method to deal with binary dataset imbalances on multiclass datasets may not be valid.In addition, imbalanced multiclass also often appear in realworld problems.Three types of challenges are associated with imbalanced class conditions in multiclass datasets: one majority class and many minority classes, one minority and many majority classes, and many minorities and many majority classes.In addition, the slope of the distribution of instances among classes is one of many sources of difficulty for classification algorithms to deal with multiclass unbalanced data sets.Other difficulties in the data structure are always present, such as overlapping classes, slight separation (a minority class may consist of several subconcepts), and small sample sizes (lack of representative minority examples) [12]- [14].
The following are previous studies that have been carried out related to handling multiclass imbalanced datasets.Research conducted by Abdi and Hashemi [15] proposed a new method at the data level.The method is named Mahalanobis Distance Oversampling (MDO).This method creates synthetic data with the same Mahalanobis Distance from the corresponding class average.This method has been tested on 20 multiclass imbalanced datasets.With MAUC, Geometric Mean (G-Mean), precision, recall, and f-measure metrics, this method can significantly outperform Borderline-SMOTE, SMOTE, and ADASYN with the base classifiers KNN and RIPPER.Ding et al. [16] proposed a new method at the algorithm level in their research.The method is a Weighted Online Sequential Extreme Learning Machine with Kernels (WOS-ELMK) for Class Balanced Learning (CIL).Different from the Online Sequential Extreme Learning Machine (OS-ELM) method, which generally uses random feature mapping, WOS-ELMK uses kernel mapping for online class imbalance learning to overcome the problem of hidden node non-optimization, which is often associated with random feature mapping.This method has been tested on 17 binary and eight multiclass imbalanced datasets.With the gmean metric, WOS-ELMK can outperform VWOS-ELM and KOS-ELM.Meanwhile, a study by Bi and Zhang [17] proposed the Diversified Error-Correcting Output Codes (DECOC) method.The DECOC method is a method that combines Error-Correcting Output Codes (ECOC) to overcome class imbalances and various ensemble learning frameworks to find the best classification algorithm for each individual sub-dataset derived from the resampling process from the original data.DOC has been tested on 19 datasets with metric accuracy; AUC, g-mean, and f-measure outperformed 17 state-of-art methods on imbalance learning.The limitation of the DECOC method is that it is timeconsuming or takes much time in the execution process of the method.
Most previous research was limited to handling class imbalances in class binary datasets.Therefore, this study will focus on imbalanced classes in multiclass datasets.On the other hand, imbalanced multiclass is often found in real-world problems.The novel proposed method in this study uses the XGBoost algorithm.XGBoost is an ensemble method that can be used for imbalanced multiclass problems.The two basic ideas why using XGBoost first are to maintain the naturalization of the dataset.The second is to create a reliable machine-learning model in imbalanced multiclass conditions.This research's main novelty and contribution can be a solution to handling the imbalanced class on the machine learning model, especially in the multiclass imbalanced case, to maintain the naturalization of the datasets.Furthermore, the proposed method will be compared with several popular methods, single-classifiers, and other ensemble models.This comparison aims to measure how well the XGBoost algorithm performs compared to popular existing models, especially in multiclass imbalances.
The rest of this paper is organized as follows.Section II describes the methods used to handle the imbalanced class.In Section III, we discuss the results of the proposed method.Finally, Section IV presents our conclusions.

II. MATERIAL AND METHODS
This research was conducted through several stages.The following steps are the explanation for each process:

A. Problem Identification
This part identifies the existing problems in the form of background, formulation, problem limitation, objectives, benefits, and methodology.

B. Datasets
The dataset used in this study is a specific public dataset in the imbalance dataset obtained from the KEEL and UCI Machine Learning repositories consisted of nine imbalanced datasets with varied ratios and classes.The following Table I and Table II show the details of the dataset used.

C. Data Preprocessing
At this stage, checking for missing values, instance duplication, data inconsistency, and data outliers is carried out.However, this study did not find missing values, duplicate instances, inconsistent data, or outliers.

D. Classification using Extreme Gradient Boosting
This study proposed the Extreme Gradient Boosting (XGBoost) ensemble method to handle multiclass imbalanced datasets.This method is a tree-based boosting method derivative of Gradient Boosting but in an optimized version [20].shows that the XGBoost structure is similar to the Random Forest structure.However, each tree model in XGBoost minimizes the residuals from the previous tree model.Unlike the GBDT, which generally uses the first derivative of error information, XGBoost performs a secondorder Taylor expansion on the cost function and uses the first and second derivatives.Each tree is trained with a dataset and learns the residuals from the previous three to be minimized gradually.Then the results from the training of each tree are summed using the following equations.
∑ , ∈ ℱ Where f is a function in the functional space F, where F {f x wq x } q : R m → T, w ∈ R T being the set of all possible classification trees, where q represents the structure of each tree, maps an instance to the appropriate leaf index, T is the number of leaves in the tree, w is the weight of the leaves, and K represents the number of trees.The equation below represents the objective function for optimization, trained in an additive manner by adding ft, which helps to minimize the objective, where !" represents the prediction of instance i at iteration t-1, &' , !" ( is the training loss function and Ω is the regularization condition.The following process is calculating the objective function using the following equation.
Using the following equation, the next step is to use a complexity control function to prevent overfitting.
This boosting model differs from Gradient Boost, which builds trees in series.XGBoost builds trees in parallel, similar to the Random Forest method.XGBoost builds trees in parallel using all CPU cores during training, resulting in shorter compute time [21]- [23].

E. Evaluation
The last stage in this research is to evaluate the resulting classification model.The evaluation indicators used in this study include balanced accuracy, Multiclass Area Under Curve Score (MAUC), True Positive Rate (TPR), True Negative Rate (TNR), and geometric mean (G-Mean) [24].These five are comprehensive indicators to be used to evaluate the classification model in cases of class imbalance in the dataset [25], [26].XGBoost evaluation results were compared with classification algorithms such as Logistic Regression, Decision Tree, Gaussian Naïve Bayes, K-Nearest Neighbor, and Support Vector Machine.In addition, we also compare with the ensemble algorithm, namely bagging and stacking.Bagging algorithm structure, use the Decision Tree base classifier.The Stacking algorithm structure uses the K-Nearest Neighbor algorithm, Support Vector Machine, and Decision Tree.The tests of all comparison algorithms go through the same process flow and datasets.

III. RESULTS AND DISCUSSION
In this section, the discussion focuses more on evaluating the performance of the XGBoost algorithm and comparing its performance with other classification models based on the value of balanced accuracy, g-mean, sensitivity, specificity, and MAUC.The data set in the XGBoost test scenario would be divided into two parts: the training set and the test set.The proportion of the distribution is as follows the training of 80% of the total data.Meanwhile, the test data is 20% of the total data.The followings are the evaluation results for each indicator tested on XGBoost and the results of a comparison with several other classification models such as Decision Tree (DT), Gaussian Naïve Bayes (NB), Support Vector Machine (SVM), Bagging, and Stacking.

A. Balanced Accuracy Score
In the balanced accuracy evaluation metric, the performance of the XGBoost algorithm is relatively good.

The following Table III is the comparison result of the
XGBoost balanced accuracy score when compared to DT, Gaussian NB, SVM, bagging, and stacking algorithms.Based on Table III above shows that the balanced accuracy value of the XGBoost algorithm can outperform the Decision Tree, Gaussian NB, SVM, bagging, and stacking datasets on the dataset car, contraceptive, glass, hayes-roth, new-thyroid, page blocks, wine quality-red, and wine quality-white.It even hits a perfect score of 1.00 on the new-thyroid dataset.However, in the yeast dataset, the XGBoost algorithm is less valuable than other algorithms.In the table, the values in bold are each dataset's highest balanced accuracy values.

B. Geometric Mean
The second evaluation indicator is based on the geometric mean values.On the geometric mean metric, the performance of the XGBoost algorithm is relatively good compared to others.Table IV compares the geometric mean XGBoost to the Decision Tree, Gaussian NB, SVM, bagging, and stacking algorithms.Table IV shows that the geometric mean value of the XGBoost algorithm could outperform the Decision Tree, Gaussian NB, SVM, bagging, and stacking datasets on the car, contraceptive, glass, hayes-roth, new-thyroid, page blocks dataset, wine quality-red, and wine quality-white.It even hits the perfect score of 1.00 on the new-thyroid dataset.However, in the yeast dataset, the XGBoost algorithm has a lower value than other algorithms.In the table, the values in bold are each dataset's highest g-mean score.

C. Multiclass Area Under Curve Score (MAUC)
The third evaluation indicator is based on the MAUC values.On the MAUC metric, the performance of the XGBoost algorithm is relatively good.Table V compares the MAUC XGBoost value to the Decision Tree, Gaussian NB, SVM, bagging, and stacking algorithms.Table V shows that the MAUC value of the XGBoost algorithm could outperform Decision Tree, Gaussian NB, SVM, bagging, and stacking datasets on the dataset car, contraceptive, hayes-roth, new-thyroid, page blocks, and wine quality-white.It even hits a perfect score of 1.00 on the new-thyroid dataset.However, on the glass, wine quality-red, and yeast dataset, the XGBoost algorithm has a lower value than the other algorithms.The MAUC results are clarified in Table IV, where the value in bold is the highest in each dataset.

D. True Positive Rate (TPR)
The fourth evaluation indicator is based on the TPR values.On the sensitivity metric, the performance of the XGBoost algorithm is relatively good.Table VI shows that the TPR value of the XGBoost algorithm can outperform the Decision Tree, Gaussian NB, SVM, bagging, and stacking datasets on the car, contraceptive, glass, hayes-Roth, new-thyroid, page blocks, and wine quality white-datasets.It even hits a perfect score of 1.00 on the new-thyroid dataset.However, in the wine quality-red and yeast dataset, the XGBoost algorithm has a lower value than the other algorithms.The results of the comparison of TPR values are clarified in Table V, where the value in bold is the highest in each dataset.

E. True Negative Rate (TNR)
The fifth evaluation indicator is based on the TNR values.On the sensitivity metric, the performance of the XGBoost algorithm is relatively good.In addition, Table VII shows that the TNR value of the XGBoost algorithm can outperform the Decision Tree, Gaussian NB, SVM, bagging, and stacking algorithms on all datasets.It even hits a perfect score of 1.00 on the new-thyroid dataset.The results of the comparison of TNR values are explained in Table VII, where the values in bold are the highest in each dataset.
Based on the evaluation conducted using the five indicators above.Overall, XGBoost can perform better than other models on eight datasets, including car, contraceptive, glass, hayes-Roth, new-thyroid, page blocks, wine quality-red, and wine quality-white.At the same time, the yeast dataset XGBoost model still needs to be better compared to other models.The point that makes XGBoost superior can be seen in the TPR and TNR values.These two indicators are essential in cases of class imbalance in datasets, especially multiclass datasets.When a model can produce good performance on the TPR and TNR value indicators, the model can also produce good scores on other indicators.Therefore, XGBoost can be a solution in the case of multiclass imbalanced datasets by maintaining the natural condition of the dataset.However, XGBoost has drawbacks for multiclass datasets with a class number greater than ten classes.

IV. CONCLUSION
Based on the research conducted, there are three conclusions.First, the proposed XGBoost method has a better evaluation value than the Decision Tree, Gaussian NB, and SVM classification methods.Second, the XGBoost model has a better evaluation value than the ensemble bagging method based on the Decision Tree classifier and stacking based on the KNN, SVM, and Decision Tree algorithms.Third, the XGBoost method has problems dealing with the class imbalance in datasets with multiple types greater than ten.However, this proposed model still needs to improve if it handles imbalanced cases with larger classes equal to ten.An example is the case of the yeast dataset, which has the largest number of classes compared to the other eight datasets.
In future research, researchers are suggested to combine methods at the data level, such as undersampling or oversampling and XGBoost.The basic idea is sampling to handle datasets with more than ten classes.In addition, it is also recommended to perform tests with categorical and continuous data types to ensure the reliability of the XGBoost model.
Fig 1. below shows the steps of research that have been done.

Fig 1
Fig 1 Research Workflow

Fig 2 .
below is an illustration of the XGBoost model:

Fig. 2
Fig.2XGBoost model illustrationFig 2.  shows that the XGBoost structure is similar to the Random Forest structure.However, each tree model in XGBoost minimizes the residuals from the previous tree model.Unlike the GBDT, which generally uses the first derivative of error information, XGBoost performs a secondorder Taylor expansion on the cost function and uses the first and second derivatives.Each tree is trained with a dataset and learns the residuals from the previous three to be minimized gradually.Then the results from the training of each tree are summed using the following equations.∑ , ∈ ℱ(1)

TABLE I DATASETS
Table VI shows a comparison of XGBoost's TPR values when compared to the Decision Tree, Gaussian NB, SVM, bagging, and stacking algorithms Table VII compares the true negative rate (TNR) on XGBoost to the Decision Tree, Gaussian NB, SVM, bagging, and stacking algorithms.