Classification of Malaria Cell Image Using Inception-V3 Architecture

— Malaria is a severe global public health problem caused by the bite of infected mosquitoes. It can be cured, but only with early detection and effective, quick treatment. It can cause severe conditions if not properly diagnosed and treated at an early stage. In the worst scenario, it can cause death. This study aims at focusing on classifying malaria cell images. Malaria is classified as a dangerous disease caused by the bite of the female Anophles mosquito . As such, it leads to mortality when immediate action and treatment fails to be administered. In particular, this study aims to classify malaria cell images by utilizing the Inception-V3 architecture. In this study, training was conducted on 27,558 malaria cell image data through Inception-V3 architecture by proposing 3 scenarios. The proposed scenario 1 model applies the SGD optimizer to generate a loss value of 0.13 and an accuracy value of 0.95; scenario 2 model applies the Adam optimizer to generate a loss value of 0.09 and an accuracy value of 0.96; and lastly scenario 3 implements the RMSprop optimizer to generate a loss value of 0.08 and an accuracy value of 0.97. Applying the three scenarios, the results of the study apparently indicate that the Inception-V3 model using the RMSprop optimizer is capable of providing the best accuracy results with an accuracy of 97% with the lowest loss value, compared to scenario 1 and scenario 2. Further, the test results confirms that the proposed model in this study is capable of classifying malaria cells effectively.


I. INTRODUCTION
Malaria is a serious and potentially life-threatening disease caused by the Plasmodium parasite, and it is transmitted to humans through the bite of an infected Anopheles mosquito [1]. The disease affects millions of people worldwide, particularly in sub-Saharan Africa, where it is a leading cause of death among children under five years old [2]. The accurate and timely diagnosis of malaria is critical for the effective treatment and control of the disease [3]. However, traditional diagnostic methods, such as microscopy and rapid diagnostic tests (RDTs), have limitations and can be time-consuming and labor-intensive [4].
Deep learning, a branch of machine learning, has emerged as a powerful tool for classifying medical images [5]. In recent years, deep learning-based models have been applied to a wide range of medical imaging tasks, including the classification of skin lesions, brain tumors, and lung cancer [6]. With the availability of large datasets of microscopy images of malaria-infected blood cells, deep learning-based models have also been applied to classify malaria [7].
Deep learning models, such as convolutional neural networks (CNNs) and deep belief networks (DBNs), have been trained to classify malaria-infected blood cells with high accuracy [8]. These models have been trained on a large dataset of microscopy images of blood cells, and have been shown to be able to accurately classify the images as infected or not infected with the Plasmodium parasite [9], [10].
One of the key advantages of deep learning-based models is their ability to learn features from the data, which can be used to classify the images. CNNs, in particular, have been shown to be effective in this task, as they are able to automatically extract features from the images, such as the shape and texture of the cells, which are important for the classification of malaria [11].
In addition to classification, deep learning-based models have also been applied to other tasks related to diagnosing malaria, such as the segmentation of infected cells in microscopy images and detecting the Plasmodium parasite in blood smears. These models have been trained on large datasets of images and have been shown to be able to accurately segment and detect the parasite, which can aid in the diagnosis of malaria [12].
A deep learning-based model is a powerful tool for classifying malaria-infected blood cells. This model has been trained on large datasets of microscopy images and shows accuracy in classifying the images as infected or not infected with the Plasmodium parasite [13]. With the continuing advancement of deep learning-based models, it is expected that these models will play an increasingly important role in the future diagnosis and treatment of malaria [14].
Malaria is classified as a disease generally experienced by humans due to the bite of a female Anophles mosquito carrying the plasmodium parasites, such as Plasmodium falciparum, Plasmodium vivax, Plasmodium malariae, and Plasmodium ovale. Malaria is considered a serious infection caused by peripheral blood parasites of the genus Plasmodium [15].
Malaria is commonly acknowledged as a disease that often occurs in coastal areas. Malaria sufferers typically experience flu-like symptoms, high fever, chills, and headaches, attacking all demographic features (regardless of age and gender). Symptoms of malaria typically appear after ten days to 4 weeks in the form of fever, headache, vomiting, chills, anemia, and enlarged spleen [16].
Hence, early detection of malaria-infected blood cells is deemed vital by monitoring blood cell counts annually by experts trained to detect malaria infection [17], [18]. Malaria detection involves manually counting parasites and infected red blood cells, which is highly dependent on the experience and skills of the microscope expert. As such, this expertise is highly prioritized; however, limited resources and systems could certainly affect the quality of diagnostics, leading to inappropriate diagnostic decisions [19].
Prior studies have been devoted to discussing malaria cells, especially in the case regarding the classification of blood cells infected with malaria as conducted by Reddy and Juliet [8] in 2019, aiming to obtain appropriate diagnostic results against microscopic malaria cells by employing the Convolutional Neural Network (CNN) method and the Resnet-50 architecture. In 2020, malaria-related research was also conducted by Sayyed et al. [20] by observing the comparison of the effectiveness between convolutional combinations and neural networks to detect malaria parasites.
In particular, the CNN method has been considered a popular method in two-dimensional processing data, with a similar grid topology utilizing convolution as a substitute for matrix application, which applies at least one convolution in each layer [21]. CNN is considered a type of deep learning, a machine learning component to teach computers to conduct an activity such as the training process. Therefore, this method is effectively implemented for datasets that have a two-dimensional structure [22]. In addition, this study aims to improve the performance of the proposed model, which is the Inception-V3 model, to obtain better accuracy performance results than those in previous studies.

II. MATERIAL AND METHOD
This section discusses the implementation of the CNN method in classifying malaria cell images with the Inception-V3 architecture.

A. Dataset
The applied dataset includes a malaria cell image dataset [8], [23], containing 27,558 image data, divided into two classifications: 13,775 image data for the parasitized class and 13,813 image data for the uninfected class. The sample infected and uninfected images depict in Fig. 1 and Fig. 2. Parasitized class refers to image data of blood cells infected with malaria, while uninfected refers to image data that is not infected with malaria. The data is sourced from kaggle.com [24], and the original data is generated from the official NIH website [25], which is further uploaded on kaggle.com for open access. From all existing data, the dataset is later divided into 70, 15, and 15 (percent into train data, test data, and validation data, respectively).

B. Pre-processing
Image augmentation is a technique used to artificially increase the size of a dataset by applying various modifications to the existing images. These modifications include transformations such as rotation, flipping, and cropping, as well as changes in brightness, contrast, and color. The goal of image augmentation is to create new, diverse images from the original ones, which can be used to improve the performance and robustness of machine learning models that are trained on image data.
One of the main benefits of image augmentation is that it can help reduce overfitting, a common problem in deeplearning models. Overfitting occurs when a model is trained on a small dataset and becomes too specialized to the training data, leading to poor performance on new, unseen data. By augmenting the training dataset with diverse, modified versions of the original images, the model is exposed to a wider range of variations in the data, making it more robust to new, unseen data.
Another benefit of image augmentation is that it can help to improve the generalization of models trained on image data. In other words, it can make the models more accurate in recognizing objects in images that were not part of the training dataset. This is especially important when the available dataset is small or has certain biases, and image augmentation can help to mitigate these issues. Data processing is performed by augmenting data, employing a hard library (the Image Data Generator), and augmentation is performed to overcome the problem of overfitting; thus, the model is capable of appropriately predicting the image of malaria cells [26]. Pre-processing in this study includes setting the image size by converting it to the intended size, and the image size will adjust user input in the CNN architecture [27].
Augmentation is conducted for train data and validation data during the model learning process because the validation data requested by the model to predict contains original data and data obtained from the augmentation process. Hence, it is apparent that the predicted object or image from the validation data and test data is likely similar, but if obtained from a different process, the model is further tasked to predict the results, which will be different from those in the validation test data.
The following data augmentation parameters with the image data generator are illustrated in Table I. The Table mentions the data augmentation parameters with the Image data generator, which is implemented into the data train and validation data, including: rescale =1./255, horizontal_flip = True, vertical_flip = true, rotation_range = 90, height_shift_range = 0.2, width_shift_range = 0.2, and zoom_range = 0.2. Furthermore, the image data generator that is implemented into the test data includes: rescale = 1./255. These parameters are implemented to obtain the expected results from the proposed model training process.

C. Inception-V3
A convolutional Neural Network is defined as a method to classify data in images [28], specializing in image recognition problems [8]. The advantage of the CNN model lies in the hierarchical structure of the learning layer that can be trained intensely after the model topology matches the input features, and the model can work efficiently by utilizing the spatial relationship of visual patterns to reduce the number of parameters which improves performance accuracy [29].
Inception V3 is a convolutional neural network (CNN) architecture that Google developed for image classification tasks. It is the third version of the Inception architecture and was introduced in 2015 by Szegedy et al. [30]. Inception V3 is built upon the concepts of the previous Inception architectures, and it is designed to improve the performance and efficiency of image classification tasks. The architecture of Inception V3 is a deep and complex one, consisting of a stack of inception modules that are interconnected with each other. Each inception module combines different types of convolutional and pooling layers designed to extract different features from the input image.
One of the key features of Inception V3 is its use of factorized convolutions, which reduce the number of parameters in the network while maintaining a high accuracy level. Factorized convolution is a technique that allows the network to learn both local and global features of an image, improving the model's accuracy. The Inception V3 network uses a combination of 1x1, 3x3, and 5x5 convolutional filters to extract features from the input image. The 1x1 convolutional filters are used to reduce the dimensionality of the input, while the 3x3 and 5x5 filters are used to extract more complex features from the input image.
Another important feature of Inception V3 is the use of batch normalization, which is a technique that is used to normalize the inputs to the network. Batch normalization helps to stabilize the training process and to reduce the internal covariate shift, which is the change in the distribution of the network's inputs during training.
Inception V3 is pre-trained on a large dataset called ImageNet, which contains over 14 million images and 1000 different classes. This allows the model to be used for transfer learning on other image classification tasks with a smaller dataset, where it can be fine-tuned for a specific task. Transfer learning is a technique that allows a pre-trained model to be reused on a different task by fine-tuning the model on a new dataset. This can greatly reduce the amount of data and computational resources required to train a new model from scratch.
Inception V3 has been widely used for image classification tasks and has been shown to perform very well on a variety of benchmarks. It has been used in various applications such as object detection, image segmentation, and video classification. Its use of factorized convolutions and Inception modules makes it a highly efficient and accurate model for image classification tasks. Moreover, its pre-training on ImageNet dataset makes it a versatile model that can be used in a variety of applications with a good amount of accuracy.
Inception V3 is a powerful CNN architecture that has been widely used for image classification tasks and has been shown to perform very well on a variety of benchmarks. Its use of factorized convolutions and Inception modules makes it a highly efficient and accurate model for image classification tasks. Its pre-training on the ImageNet dataset allows it to be used for transfer learning on other image classification tasks with a smaller dataset, where it can be fine-tuned for a specific task, thus reducing the amount of data and computational resources required to train a new model from scratch.
The Inception-V3 architectural model has the advantage due to its more complex architecture and more efficient computation, containing approximately 4 million parameters, significantly smaller compared to VGG, more complex architecture, and this model does not apply a fully-connected layer replacing it with a pooling layer only. These fewer parameters result in a smaller model size, enabling faster model calculation [31].
The initial layer of Inception-V3 consists of 3 standard convolution layers, followed by a max-pooling layer, 2 convolution layers, and a maxpooling layer. The next stage in the network includes inception convolution, which simultaneously convolutes the input by utilizing a different filter size for each convolution, further combining or stacking the results together and passing them across the network. Subsequent sections of the network re-start and rest a number of times, in which some sections repeat 10 or 20 times towards the end. The network implements the stop-learning layer to drop weights randomly (making the filter value equal to 0) to prevent overfitting. Furthermore, the second last layer is fully connected [32].

A. Data Sample Collection
The initial stage of the data sample is performed by splitting the data and dividing each class into several parts, consisting of train data, validation data, and test data. Train data is employed to train the model, from which the results of the learning are tested for validation data, and then test data is employed in the evaluation process to determine the model's performance. Validation data is additionally employed in test data, but for the purposes of model training, validation data is required, which is the test data itself. Each of these data is divided into 70% for train data, 15% for validation data, and 15 % for test data after the data splitting process, illustrated in Table II.

B. Result
The first scenario is to employ similar optimizer as the reference journal, such as the SGD (Stochastic gradient descent) optimizer [8]. In this study, the SGD optimizer is implemented into the Inception-V3 model, with a learning rate of 0.001 and a data sharing of 70:15:15. In scenario 1, the set data training time is 3 hours, 18 minutes and 58 seconds. The evaluation of scenario 1 by utilizing the plot is described in Fig.3, covering Training and Validation of Accuracy, and Fig.4 describes the plot results from training and validation loss. The evaluation in the form of a classification report for the Inception-V3 model is illustrated in Table III which explains the results of the classification report values of scenario 1.  Scenario 2 shows the adam optimizer (Adaptive Moment Estimation) is implemented with the Inception-V3 model. The adam optimizer has individual learning rates for different parameters. Thus, the adam optimizer is expected to provide better accuracy. In addition, the adam optimizer does not require a lot of storage space, and this optimizer is also lighter during the epoch training process [23]. The results of accuracy and loss of scenario 2 are depicted in in Figs. 5 and 6.    Referring from the performed tests, particularly as presented in Table VI and Table VII, it show the best model performance in scenario 3. The scenario employed the RMSprop optimizer with a learning rate of 0.0001, passing through a training process of 50 epochs iteration, resulting in better performance than other models. The Adam optimizer also indicates similar accuracy in scenario 2 but scenario 3 has a lower probability of prediction failure or value loss and has high accuracy training value among other scenarios.

D. Comparison of Architectural models with Previous Studies
The model that generates the best performance is further compared to the previous study models, including Resnet50 model with the SGD optimizer [8] and the Convolutional Neural Network model [33], as depicted in Table VIII.  Table VIII presents that the proposed model generates a train accuracy of 96%, which is greater than 0.01 compared to the previous model using the ResNet50 model [8] and the CNN model with 3 convolutional layer architecture [33]. In sum, the comparison of performance based on the accuracy is depicted in Table IX, confirming that the proposed scenario has an accuracy of 0.02%, which is deemed superior to previous studies.

IV. CONCLUSION
Based on the test scenario, which has modeled the Inception-V3 architecture, the application of the RMSprop optimizer with a learning rate of 0.0001 is profound to generate superior performance than the test scenario, previously employing the Adam learning rate optimizer of 0.001 and SGD learning rate of 0.001. The scenario proposed in this stud can improve the accuracy from the previous study reaching 97%, thereby providing better performance.