Deep Learning Approach for Prediction of Brain Tumor from Small Number of MRI Images

— Daily, the computer industry has been moving towards machine intelligence. Deep learning is a subfield of artificial intelligence (AI)'s machine learning (ML). It has AI features that mimic the functioning of the human brain in analyzing data and generating patterns for making decisions. Deep learning is gaining much attention nowadays because of its superior precision when trained with large data. This study uses the deep learning approach to predict brain tumors from medical images of magnetic resonance imaging (MRI). This study is conducted based on CRISP-DM methodology using three deep learning algorithms: VGG-16, Inception V3, MobileNet V2, and implemented by the Python platform. The algorithms predict a small number of MRI medical images since the dataset has only 98 image samples of benign and 155 image samples of malignant brain tumors. Subsequently, the main objective of this work is to identify the best deep learning algorithm that performs on small-sized datasets. The performance evaluation results are based on the confusion matrix criteria, accuracy, precision, and recall, among others. Generally, the classification results of the MobileNet-V2 tend to be higher than the other models since its recall value is 86.00%. For Inception-V3, it got the second highest accuracy, 84.00%, and the lowest accuracy is VGG-16 since it got 79.00%. Thus, in this work, we show that DL technology in the medical field can be more advanced and easier to predict brain tumors, even with a small dataset.


I. INTRODUCTION
Deep learning is a type of artificial intelligence (AI) that mimics the functioning of the human brain in analyzing data and generating patterns for making decisions [1]. As said in [2], the importance of deep learning is that the potential to analyze large numbers of characteristics makes deep learning more efficient when facing unstructured data, but the data to be used should be in a large amount to make the deep learning algorithms a very powerful predictor. A brain tumor is an anomalous cell that grows in the brain [3]. Usually, the deep learning approach is used to predict brain tumors from many medical images. The dataset has samples of benign and malignant brain tumor data to be compared in the classification [4]- [7]. The principle of deep learning should have a minimum of 1000 images per class for image classification [8]. The problem is how to configure a model to deal with fewer numbers of brain tumor data. Deep learning makes it very challenging to predict brain tumor cases when the sample data is small. The dilemma of a small number of samples causes the algorithm not to learn very well and, in the end, not produce a significant result.
A vast number of researches have been dedicated to using modern technologies such as machine learning (ML) and deep learning (DL) in the medical fields [9]- [12]. Six papers have been discussed in this section. Based on the previous projects, two projects use the same data set as the current project [13], [14]. The previous study conducted three algorithms: Resnet-50, Inception-V3, and VGG-16 [15]. The dataset obtained from Navoneel is trained and tested in this work. The dataset consists of 253 brain MRI samples, 155 malignant tumor samples, and 98 benign tumor samples. The implementation covers various measurements and different aspect ratios in the crop normalization process. The pre-training model requires the image dimensions to be 224 × 224 × 3. Hence, the images have to be resized to a predefined format. Following the analytical findings, the Resnet-50 obtains maximum test accuracy of 95% with a zero-negative rate. In contrast, the VGG-16 achieves a decent accuracy of 90%. In conclusion, they suggest that hyperparameter adjustments and better processing techniques can be made [16].
In [14], the models that are applied in the implementation of transfer learning are VGG-16, ResNet-50, and Inception-V3. Backpropagation learning trains knowledge in pattern, sequential or incremental mode, and batch mode. It needs less local storage, a quicker method, and appears to be captured at the local minimum. It handles online learning and stops it after a certain time has elapsed or when mistakes surpass the required amount. Subsequently, they use a supervised network or reinforcement learning intending to improve the training of their models. Among the Keras versions, ResNet-50 scores the best overall accuracy, and models' resolution has the best validation accuracy of 91.09%. Also, processing the dataset without transfer learning consumes 40 minutes, while transfer learning consumes 20 minutes (i.e., 50% lesser time).
Brain tumor Isocitrate dehydrogenase (IDH) status was identified via DL algorithms [17]. This study uses 52 subjects as their test dataset and three ML models, ResNet-50, DenseNet-161, and Inception-v4, to predict IDH status. Based on their findings, the three models pass the 5-folds preparation, testing, and validation. In this study, the suggested models aim to provide limited preprocessing and achieve high accuracy without tumor segmentation or appealing regional extraction. The DenseNet-161 model excels the results of both ResNet-50 and Inception-v4 models by scoring 90.5% accuracy with a standard deviation of ± 1.0% and AUC of 0.95. Unlike the ResNet-50 model, the DenseNet-161 platform architecture carries the data from all previous layers and adds the information to the next layer. This procedure helps to understand the data from various layers before passing it to the next layers. Acknowledging the topic of dealing with data leakage is their biggest contribution.
The volumetric and location characteristics are retrieved from each segmentation file given in the dataset in the first stage [18]. Third, the histogram features were extracted and utilized to train the ML algorithms to classify the data. These combined attribute vectors with labels are used to train various ML algorithms. Overall, the absence of classification accuracy is impaired by the observation of numerous undesired classification classes, as shown in Table I. The extracted data represents the statistical and strength texture features, and several ML are trained using texture features. With 10-fold specificity, the classification accuracy based on three classes did not exceed 46.0%. The classification precision increased to 65.0% when K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) classifiers were used. Reinforcement learning is used in a Deep Q Network (DQN) model to process the MRI data [19]. The data collection of 100 images is used for testing Brain Lesions based on MRI, of which 70 images are used for training and 30 images for testing. The success rates in these reinforcement learning predictions reliably increase during testing, while deep learning results tracked easily deviate. Reinforcement learning is estimated by testing setting lesion positions with an accuracy of 85.0%, compared to an accuracy of about 7.0% for the supervised deep network. In conclusion, lesions can be predicted highly with reinforcement learning for such a limited training set.
Integrating Capsule networks (CapsNets) was proposed with Bayesian algorithms to create a BayesCap model [20]. It uses a dataset of 3,064 brain MRI images to test the proposed BayesCap, Bayesian CNN, and CapsNets for validation. The diagnosis includes one of the three tumor types: meningioma, glioma, and pituitary. 80% of the data is selected by replacing the training set on each bootstrap used. A total of 30 iterations are performed to assess the general probability of the CapsNets as well as calculate the 95% interval confidence (CI) of the results. Using the same data as the proposed BayesCap model, non-Bayes CapsNets are also trained, resulting in lower precision than the initial precision obtained after 500 times which is 68.3% with CI (67.1%, 69.5%). With a group size of 16, the models in each bootstrap were practiced 500 times. However, this accuracy is less than 78% obtained using the whole brain MRI images from non-Bayesian CapsNet, which is predicted because the Bayesian variant is targeted at studying the back of model weights and catching uncertainty instead of increasing accuracy. By gathering more image data, such variability can be decreased. The results demonstrated in this analysis that the precision could be steadily increased by filtering out the unknown forecasts at various thresholds. Table I shows the summary of each related work and its accuracy rate.
In short, it shows that the DenseNet-161 is the best approach that can be used in DL prediction research for a small dataset by achieving an accuracy of 90.5%. On the other hand, the Inception-V3 is an unsuitable model to train and test the small size of datasets. This is further strengthened by the findings of [1] and [15], in which Inception-V3 achieves 50.0% accuracy and 55.0%, respectively.
The objectives of this study are defined in two points: to apply three types of DL models: VGG-16, Inception-V3, and MobileNet-V2 on brain tumor datasets of MRI images and to assess and analyze the performance of the DL models using accuracy, precision, and recall among other criteria. Therefore, this study uses three DL algorithms implemented in Python to predict brain tumors from a small set of medical images of MRI type.

II. MATERIAL AND METHOD
Data mining is a popular trend for exploring ML and DL methods and techniques to discover new patterns and knowledge from different data sources. Model building and pattern recognition are two fundamental elements of data mining, and much of its emphasis is on filtering and reducing data. While certain sub-disciplines of statistics have studied special cases of this problem, most of the data mining work on pattern detection to date has been computational and focuses on Improving and developing algorithms.
The CRISP-DM methodology is produced by a consortium effort originally formed with DaimlerChrysler, SPSS, and NCRR [21]. CRISP-DM stands for CRoss-Industry Standard Process for Data Mining, a common technique to boost the performance of Data Mining ventures nowadays [22]. This methodology also defines a project as a cyclic process where it can help support business decisions by defining a non-rigid six-phase process, as shown in Fig.1. It facilitates the creation and application of data mining models to be used in real-world settings. Fig. 1 The phases of the CRISP-DM method [23] Business understanding is a phase that requires understanding the business objectives and turning them into data mining problems and developing project plans. The project background should be investigated from various sources to determine the satisfaction of the business objectives. In this study, business understanding is referred to as a project of automating the tumor type diagnosis since it does not involve real industry. We defined the project's objectives at this stage: (i) to implement three types of deep learning models: VGG-16, Inception-V3, and MobileNet-V2 on MRI image datasets of brain tumors, and; (ii) to test and compare the prediction of the tumor images using accuracy, precision, and recall parameters based on the results of the DL models. Subsequently, the data understanding stage begins with the compilation of initial data and access to the dataset to explore, describe, examine, and verify its usefulness and quality.
Data preparation is a state of deciding on the dataset, the collection of features to be considered, and the suitable partitioning of the dataset. The dataset to be applied is examined based on data analysis, and numerous preprocessing steps are implemented to verify its content and make it ready for running. The data preparation phase involves selecting, cleaning, constructing (data attributes), reformatting, and integrating (merging the data). Since the data of this work is an image, and the method used is DL, it does not need to be numerically prepared, and only the quality of the image should be examined.
Thereafter, the relevant modeling methods, algorithms, or combinations are chosen and prepared in the modeling phase.
The setting values of the parameters of the algorithms are also determined. The algorithms to be used in this phase are VGG-16, Inception-V3, and MobileNet-V2. The performance of the algorithms is formally measured based on accuracy, precision, and recall as standard data mining criteria. The processes of this phase are always carried out in an iterative way in which the selected model consistently operates to satisfy the testing requirements. The performance result for each algorithm is summarized, and which method has the highest performance is identified. Before the final implementation, a scientifically high-quality model has already been identified in the development phase. Lastly, it is important to analyze the product's final implementation closely. The best model development measures should be revised and ensure that market priorities are properly achieved.

A. Brain Tumor MRI Dataset
The dataset used in this study was taken from the Kaggle website, namely Brain Tumor MRI Images for Brain Tumor Detection by Navoneel Chakrabarty [24]. The total number of samples was 253, of which 98 were benign samples while 155 images were malignant brain tumor samples. Fig. 2 (a) shows "No" samples (benign), and Fig. 2 (b) shows "Yes" samples (malignant). The resolution of each image is between 88x88 dpi to 300x300 dpi, and they are 2-dimension (2D) images. Fig. 2 Samples of brain tumor MRI dataset [24]

B. Deep Learning Algorithms
This section outlines the three DL algorithms to be used in this image classification experiment: VGG-16, Inception-V3, and MobileNet-V2.

1) VGG-16:
VGG-16 is a convolutional neural network that has 16 layers. It was proposed by Simonyan and Zisserman from the University of Oxford in their paper "Very Deep Convolutional Networks for Large-Scale Image Recognition" [25]. Using the VGG-16 network in this experiment, the pre-trained VGG-16 coevolutionary neural network model was adjusted to prevent overfitting by freezing some layers due to the small dataset [26]. It contains 16 layers of Convolution in the network, and eventually, the layers will be interconnected with the SoftMax output layer, which strengthens the ability to learn hidden features [27].
2) Inception-V3: A basic convolution block replaces the convolution layer and maximum aggregation for feature extraction. Inception-v3 frequently employs conventional kernels to reduce the number of feature channels and speed up training. Because the basic convolution block, the enhanced Inception module, and the classification are three elements of the Inception-V3 model [28], the object recognition performance of the Inception-v3 model is superior. Finally, Inception-v3 boasts the best object recognition performance because of Inception's unique architecture. As a result, this model is commonly used for transfer learning [29].

3) MobileNet-V2:
MobileNet is a CNN-based model that is commonly used to recognize objects in pictures. As illustrated in Figure 3, the fundamental advantage of employing the MobileNet architecture is that it requires significantly less computing effort than the classic CNN model, making it perfect for usage on mobile devices and PCs with lower processing capabilities [31]. MobileNet is built on a complex design. The core structure is built on top of numerous levels of abstraction, which are components of various convolutions that appear to be quantized configurations that thoroughly examine the complexity of ordinary situations [32]. Point-wise complexity refers to the complexity of 1x1 points. The depth infrastructure is based on an abstraction layer that points the depth structure through a corrected standard linear unit (ReLU). With the same variable, the resolution multiplier is used to lower the dimensions of the input picture and the internal representation of each layer.

C. Evaluation Metrics
All values of two attributes are considered in this preprocessing classification: tumor and grey matter features. The classification accuracy of VGG-16, Inception-V3, and MobileNet-V2 algorithms is compared at the end of this study. In addition, a confusion matrix containing True Positive, True Negative, False Positive, and False Negative values will be used to calculate the values of accuracy, precision, and recall parameters in this study [26]- [33].

1) Accuracy:
Accuracy is the number of correctly predicted data points from all the datasets. Where TP, TN, FP, and FN stand for True Positive, True Negative, False Positive, and False Negative.

2) Precision:
Precision is the total number of positives out of all clearly defined positives. Where TP is True Positive, meanwhile FP is False Positive.

3) Recall:
The recall is the same as sensitivity, meaning the ratio between the total number of positive outputs was clearly defined as positive to the actual positive amount. Where TP is True Positive, while FN is False Negative.

=
(3) 4) F1 Score: F1 score is considered a better predictor of the performance of the classifier than the standard accuracy test.   Table III shows the result of sensitivity, specificity, precision, recall, f1-score, and accuracy for VGG-16, Inception-V3, and MobileNet-V2. The test result shows that the highest accuracy for MobileNet-V2 is 86.00%, For Inception-V3, it got the second highest accuracy of 84.00%, and the lowest accuracy is VGG-16 since it got 79.00%. Subsequently, Fig. 4 illustrates the accuracy and error changes over the 100 epochs employed. It is to be observed that the accuracy has risen to almost 80% after the 20th epoch.

IV. CONCLUSION
This paper presents the diagnosis of a brain tumor based on MRI images. The diagnosis is performed using three DL algorithms, VGG-16, Inception-V3, and MobileNet-V2. The dataset of MRI brain tumor images is used for testing and evaluating the three algorithms. Consequently, the main objective of this work is to identify the best DL algorithm that performs on small-sized datasets. The test result shows that the VGG-16 achieves an accuracy score of 79.00%, Inception-V3 achieves an accuracy score of 84.00%, and MobileNet-V2 achieves an accuracy score of 86.00%. The main constraint addressed in this project is that the dataset has a small amount of data, which results in insufficient learning from the supposed amount of image samples. Nevertheless, the MobileNet-V2 is found to be the best algorithm among the three to deal with such constraints. In the future, the project will be improved by using some well-known feature selection methods to select the most useful features. The feature selection mechanisms can improve the performance of the DL algorithm.