ON INFORMATICS Classification of Brain Tumors on MRI Images Using DenseNet and Support Vector Machine

— The brain is a vital organ in the human body, performing various functions. The brain has always played a major role in the processing of sensory information, the production of muscular activity, and the performance of high-level cognitive functions. Among the most prevalent diseases of the brain is the development of aberrant tissue in brain cells, which results in the formation of brain tumors. According to data from the International Agency for Research on Cancer (IARC), more than 124,000 people worldwide were diagnosed with brain tumors in 2014, and more than 97,000 people died due to the condition. Current research indicates that magnetic resonance imaging (MRI) is the most effective means of detecting brain cancers. Because brain tumors are associated with significant mortality risk, a large number of brain tumor MRI imaging datasets were used in this research to detect brain cancers using deep learning techniques. To classify three forms of brain tumors, including glioma, meningioma, and pituitary, a deep learning model called DenseNet 201 paired with Support Vector Machines (SVM) was employed in this work included three types of brain tumors. Based on the results of the tests that were conducted, the best accuracy results obtained in this study were 99.65 percent, with a comparison ratio of 80 percent for training data and 20 percent for testing data, oversampled with the SMOTE method, with the best accuracy results obtained in this study being 99.65 percent.


I. INTRODUCTION
The brain serves as an important organ in the human body; like other organs of the body, the brain plays a dominant role in everyday life. With a size of 1,350cc, the brain has more than 100 million nerve cells, which control all human activities. The cerebral cortex refers to the outer part of the human brain that processes sensory performance and produces motor and high-level cognitive activities [1]. Therefore, the brain becomes one of the most important organs in supporting human life. If the brain dies, the function of the nerve cells would improperly run, resulting in mortality for the sufferer.
The most common disease in the brain is a brain tumor, arising due to the growth of abnormal tissue in the brain. One of the most common brain tumors in children and adults is glioma or astrocytoma [2]. Based on the cause of the emergence, abnormal tissue in the brain is divided into two parts primary and secondary [3]. Primary brain tumor is due to the appearance of tumor cells, originated from the brain tissue itself. Meanwhile, brain tumors are classified as secondary brain tumor as a result of cancerous tissue in other parts of body organs that spread to the brain. Brain tumors grow due to genetic mutations in brain cells, despite uncertain cause regarding genetic mutations. However, there are several factors that can increase a person's risk of developing a brain tumor, such as age, heredity and radiotherapy. In sum, the most lethal type of primary brain tumor and commonly occurs in adults is glioma [4].
Based on data from the International Agency for Research of Cancer in 2014, more than 124,000 people suffered from brain tumors, generating mortality incidents of more than 97,000 sufferers, thereby indicating that this disease is deemed dangerous for the survival of the sufferer. At this time, the majority of disease diagnoses in a person are engaged with technology as a supporting medium for examinations by medical personnel due to the high accuracy of current technology and rapid technological progress. Magnetic Resonance Imaging is the most commonly employed technology to diagnose brain cancer and is one of the most powerful and versatile imaging methods in clinical medicine. The function of this technology is to capture images of organs on the patient's body using a strong magnetic force around the patient's limbs. During the last three decades, MRI has yielded numerous developments which can yield practical and quantitative information, such as network microarchitecture and perfusion flow [5]. MRI is utilized to diagnose brain tumors due to the ability of such technology not to threaten vulnerable people, compared to CT scans. In addition, the quality of the obtained images has been progressing, facilitating early diagnosis of the disease. Since the existence of this technology, MRI images are more readily available on the internet, thereby encouraging the researchers of this study to create products assisting the medical workers.
At the time of the diagnosis process by paramedics, however, decision-making and conclusions on brain tumor patients require a longer time. In contrast, treating brain tumor patients requires immediate action [2]. Therefore, with numerous sources providing MRI images of brain tumor patients, more researchers are challenged to create systems capable of classifying brain tumors. By utilizing machine learning and deep learning methods to create classification systems, the diagnosis process has frequently been utilizing MRI images due to their shorter duration and accurate result. The most frequently used machine learning methods include KNN, Neural Network, SVM, and Random Forests. Meanwhile, the most commonly utilized deep learning algorithm is the Convolution Neural Network.
In recent years, several studies have specifically aimed at classifying brain tumors utilizing MRI images of patients. In the last five years, numerous studies confirmed that MRI could classify brain tumor disease with faster results. Another study has indicated good accuracy with the applied scenario is divided into two parts, including the different architectural models for different datasets. The study was conducted by Sultan et al. [6] by applying the CNN algorithm. The utilized dataset in the first scenario employs a dataset undertaken from Nanfang Hospital and General Hospital, Tianjin Medical University, China, from 2005 to 2010. The dataset is divided into three classes: meningioma, glioma, and pituitary brain tumors. The established model in this study achieves an average accuracy of 96.13%. While in the second scenario, the utilized dataset involves a dataset that classifies glioma brain tumors of grades I, II, III, and IV obtained from The Cancer Imaging Archive (TCIA) repository, acknowledged as The Repository of Molecular Brain Neoplasm Data (REMBRANT) dataset. The accuracy obtained in the second scenario is 98.7%, divided into three classes: meningioma, glioma, and pituitary brain tumors.
The next relevant research was conducted by Gumaei et al. [7], which utilized a brain tumor from the MRI image dataset created by Cheng, engaging 3064 brain MRI images divided into 1426 meningio 708 glioma images, and 930 pituitary images. The Regularized Extreme Learning Machine (RELM) method used in this study resulted in an accuracy of 94.23%.
The third study on the topic of brain tumors in 2019, which also received excellent accuracy, was the research conducted by Deepak et al. [8] obtained an accuracy of 98% with a pretrained CNN model, along with GoogleNet as a method for feature extraction from MRI images of brain tumors and Support Vector Machines (SVM) as a method for class classification of MRI images. The utilized dataset in the study employed a dataset created by Cheng obtained from figshare [9].
Research conducted by Swati et al. [10] also utilized the transfer learning method to classify MRI images of brain tumors, VGG19, obtaining the best accuracy of 94.82%. The used dataset in the study utilized a dataset with a total of 3064 images obtained from 233 brain tumor patients [9]. Another related study was conducted by Noreen et al. [11] that used a dataset [9] and included the InceptionV3 and DenseNet201, which obtained an accuracy of 99.34% and 99.51%, respectively.
The results obtained from several previous studies indicated the accuracy of the methods and are urged to develop. Thereby generating better accuracy than previous research proved by the method in the latest research of InceptionV3 and DenseNet201 provided an accuracy above 99%. Based on the stated problems, the authors of this study aim to create a model that can classify brain tumors using images from Magnetic Resonance Image, through feature extraction from each image using deep learning methods.

A. Dataset
The utilized dataset in this study involves a brain MRI image dataset created by Jun Cheng from 233 brain tumor patients. The brain tumor patient dataset was undertaken from 2005 to 2010 at Nanfang Hospital, Guangzhou, China, and General Hospital, Tianjin Medical University, China [10]. An example of these three techniques is illustrated in Figure 1.  [12] The dataset is divided into three classes, meningioma, glioma, and pituitary, as visualized in Figure 2, comprising approximately 1426 meningioma images, 708 glioma images, and 930 pituitary images. Based on the number, the Cheng dataset used in this study is considered imbalanced. The dataset measuring 512x512 pixels was converted to a size of 224x224 pixels for more optimal performance. The converted dataset is divided into two parts, comprising 80% training data and 20% testing data. In the division process, the functions in the sklearn library are randomly applied with the random state value of 21. The dataset was first extracted using the DenseNet 201 model, providing the best weight for each pixel of the dataset image as a parameter in determining the class of the brain tumor MRI image. Each extracted image then has a new parameter of 94,080.

B. SMOTE
The dataset used in this study is regarded as an imbalanced dataset, and the amount of data in each class significantly differs from the three existing classes. In several studies, the oversampling method was applied to anticipate the problem of offset datasets by replicating the dataset that has the least amount of data as much as the most dominant dataset. One method to perform oversampling is through SMOTE, proposed by Chawla in 2002, where the minority class was oversampled by creating synthetic data based on K-nearest neighbors [12].
C. Proposed Model 1) DenseNet 201: DenseNet 201 becomes one of the methods in transfer learning in deep learning where the pretrained models on previous problems are reused to solve the existing problems [13]. Several kinds of pretreated models that exist on CNN include AlexNet, VGGNet, LeNet, and DenseNet [14]- [18]. This study particularly applies a pretrained DenseNet201 to extract features from brain MRI images, as illustrated in Figure 3. The brain tumor dataset in DenseNet 201 method was extracted on lower dense and upper dense blocks, including the four levels of dense blocks, distinguished by the number of layers in each block [13].

3) Max-Pooling Layer:
It is a layer that functions to break down the dataset and create the largest value of each pixel in 4 new elements. This layer reduces the parameters used and speeds the computing process [6]. The image of the Max-Pooling layer is illustrated in Figure 5.

4) Fully Connected Layer:
This layer is located on the last CNN layer which acts as a classifier. The class label on the tested data was determined by using the fully connected layer.

5) Support Vector Machine (SVM): Support Vector
Machines (SVM) becomes one of the methods in supervised learning, typically applied for classification and regression [19]- [21]. SVM when compared with other classification methods offers a better concept in addressing either linear or non-linear problems. In this study, SVM was preferred, as in previous studies [8]. SVM classifier was paired with Deep CNN for feature extraction to produce good accuracy.

III. RESULT AND DISCUSSION
At this stage, the four tests were conducted, presenting fundamental difference in each proposed scenario in the use of two different classifiers, tested with the number of balance and imbalance datasets. The imbalance dataset type is used in the first and second test scenarios, while the balanced dataset is used in the third and fourth test scenarios. Changes in model parameters in both the DenseNet classifier and SVM classifier are also conducted to obtain the best model performance.

A. Test Scenario
The first test scenario contains the classification of imbalanced brain tumor MRI image dataset extracted using the DenseNet 201 model and classified as the Support Vector Machines (SVM) model. The number of training data used is 1143 glioma class, consisting of 740 pituitary classes and 568 meningioma classes. The performance of this model will be determined by using test data that has been split by 20% in each class. The parameter specified in the SVM classifier is kernel rbf, and the value C = 10.
The dataset used in this second test scenario is similar to the dataset in the first scenario. The DenseNet 201 model used is modified on the last dense layer. The training process for the DenseNet 201 model in this second scenario involves the 200 epochs, the Adam optimizer, a learning rate of 0.0001, a decay rate of 0.0001/16, a batch size of 16, a SoftMax activation function at the last layer and a loss categorical cross-entropy function. The results of the accuracy of the second scenario test are illustrated in Figure  6.
The best accuracy obtained in this second scenario is 98.04% during model training, but when the final model is tested with the test data, the accuracy obtained decreases to 97.22%. The values of loss, precision, and recall during the second scenario model training are presented in Figure 7, Figure 8, and Figure 9, respectively.
The results visualized in the second test classification report could not pass the best performance from the first test. The decrease that occurs in the value of precision, recall, and f1-score in the meningioma class hinders the performance of the validation model for the second scenario, thereby lowering the value in the first scenario. Whereas during model training, the accuracy validation value obtained can reach an accuracy of 98.04%. The third test scenario was similarly performed to the first test scenario. The fundamental difference in this test lies in the dataset used, which was initially imbalanced and oversampled with the SMOTE method, creating similar amounts of data between the minor class and major class datasets. The parameters used in the SVM model are similar to the first scenario for more apparent test results. The increase in precision, recall, and f1-score in all tested classes is apparently significant, confirming that the third scenario with a balanced dataset can perform better than the two previous scenarios. In the last scenario test, the DenseNet 201 model used is similar to the second test scenario. The dataset which was initially imbalanced was then oversampled, balancing the amount of data in each class. The model training conditions used in this scenario utilize 200 epochs, Adam optimizer, the learning rate of 0.0001, decay rate of 0.0001/16, batch size of 16, categorical crossentropy loss function, and SoftMax activation function in the last layer. The accuracy obtained during the model training process is illustrated in Figure 10.
When conducting the training and testing process, the performance of the fourth scenario model that is run with a balanced dataset is apparently better than in all the previous scenarios. The epoch that was initially set 200 times can be stopped with a callback function because it has reached the desired accuracy in a fairly short time. The highest accuracy obtained in this fourth scenario is 99.65%, thereby suspending the training process which has been running for 59 epochs. The loss, precision, and recall graphs illustrated in Figure  11, Figure 12, and Figure 13 additionally indicate that this model is better used with a balanced dataset.

B. Evaluation of Test Results
The method proposed by the author in this study is the use of transfer learning methods to extract MRI images of brain tumors and to navigate the performance of a model trained with the results of the feature extraction of the DenseNet 201 model. The Jun Cheng dataset was created from 233 brain tumor patients collected from 2005 to 2010. The patient data used were obtained from two different hospitals. The first hospital in Nanfang Hospital, Guangzhou, China, and the general hospital is Taijin Medical University, China. This dataset is divided into three classes: meningiomas, gliomas, and pituitary. However, the amount of data in each class in this dataset has a fairly large gap, with the majority class of meningioma amounting to 1426 images, glioma class 708 images, and pituitary 930 images.
Previous studies with similar topics utilize the Jun Cheng dataset, obtaining fairly good results. The first research conducted by Gumaei et al. [7] obtained 94.23% accuracy. The highest results obtained from several previous studies were 99.51%, where this study was conducted by Noren et al. [11] utilized the DenseNet 201 model involving 100 epochs, learning rate 0.0001, 20 batch sizes, Adam optimizer, and categorical cross-entropy as loss functions. This study's comparison of training data and test data is similar to the previous research, including 80% training data and 20% testing data. The DenseNet 201 model used in this study obtained the best accuracy at 99.65% in the fourth test scenario with oversampling datasets, balancing the amount of data in each class. The results of testing all scenarios are illustrated in Table 1. The best performance obtained in this study from the fourth test scenario is conducted by utilizing the DenseNet 201 model and dataset balance. A comparison of model performance with previous research is illustrated in Table 2. Previous research obtaining the best performance was conducted by Noren et al. [11]. The DenseNet 201 model used in this study obtained an accuracy performance of up to 99.51%. The difference between the DenseNet 201 model in the current study and previous research lies in the modification of the last dense layer and the use of datasets that have been balanced with the SMOTE method. The author utilizes the DenseNet 201 model due to this model's ability to obtain the most optimal performance compared to the models used in other studies.

IV. CONCLUSION
Based on all test scenarios, it is concluded that the DenseNet 201 model could classify MRI images of brain tumors with excellent performance. The use of the DenseNet 201 model to extract the features of the MRI image, combined with the SVM classifier, presents good model performance. The same model and dataset were utilized in this study with some changes made, using the SMOTE method to change the imbalanced dataset into a balanced dataset. In addition, modifications to the last dense layer were also made to optimize the model performance. With the balanced dataset and the DenseNet 201 model, the highest result obtained in this study was 99.65% compared to the imbalanced dataset in the previous study. Thus, from all scenarios in this study, it is concluded that the use of dataset balance could exceed the results of previous studies utilizing a similar model, which ia DenseNet 201, due to the data balance of each class affecting the performance of the model during the classification process.