ON INFORMATICS

— Pneumonia is one of the world's top causes of mortality, especially for children. Chest X-rays serve an important part in diagnosing pneumonia due to the cost-effectiveness and quick advancement of the technology. Detecting Pneumonia through Chest X-rays (CXR) is a challenging and time-consuming process requiring trained professionals. This issue has been solved by the development of automation technology which is machine learning. Moreover, Deep Learning (DL), a machine learning specification that uses an algorithm that resembles the human brain, can predict more accurately and is now dependable enough to predict pneumonia. As time passes, another Deep Learning improvement has been made to produce a new method called Transfer Learning, that is done by extracting specific layers from some pre-trained network to be used on other datasets, which reduces the training time and improves the model performance. Although numerous algorithms are already available for pneumonia identification, a comprehensive literature evaluation and clinical recommendations are still small in numbers. This research will assist practitioners in choosing some of the best procedures from the recent research, reviewing the available datasets, and comprehending the outcomes gained in this domain. The reviewed papers show that the best score for predicting pneumonia using DL from CXR was 99.4% accuracy. The exceptional techniques and results from the reviewed papers served as great references for future research.


I. INTRODUCTION
Pneumonia is the major cause of death globally, contributing to around 16% of all deaths of children under five years worldwide [1], the world's leading cause of death among young children [2]. In Indonesia itself, it ranks second as the cause of the toddler's death. In 2019 only, it was reported that there were more than 400,000 pneumonia cases in this country [3]. Pneumonia is an illness that causes the air sacs in one or both lungs to become inflamed. The air sacs may get filled with fluid or pus, resulting in a cough with phlegm or pus, fever, chills, and trouble breathing. Pneumonia can be caused by several species, including bacteria, viruses, and fungi [4].
Since several species can cause pneumonia, the first word indicates the species that can cause pneumonia. For example, bacterial pneumonia is caused by bacteria, a virus causes viral pneumonia, and fungal pneumonia is pneumonia usually caused by breathing certain fungal spores [5].
The early diagnosis of pneumonia may start with a simple lung check with a stethoscope; if the lung is suspected of pneumonia, additional tests might be required. In clinics, many imaging modalities such as magnetic resonance imaging (MRI), computed tomography, and CXR are used to diagnose pneumonia (CT). CXR is the most often used method to identify pneumonia globally due to its low cost and ease of access [6].
Chest radiographs give valuable information about a patient's status; yet even highly experienced radiologists have difficulty recognizing pneumonia with CXRs since these pictures contain comparable opacities for other lung diseases such as lung cancer and excess fluid. Therefore, traditional pneumonia identification by CXR pictures is time consuming and imprecise, potentially delaying the diagnosis and treatment process. As a result of the critical need for early complexity of interpreting these images, computer-aided detection (CAD) systems represent an exciting approach for automated detection that can assist physicians in overcoming the aforementioned issues and improving detection accuracy in a clinical setting [7].
CAD systems can complement medical staff decisionmaking by combining parts of computer vision and machine learning (ML) with images such as radiographical images to detect and bring out patterns [8]. A typical CAD system consecutively analyzes input data (i.e., CXRs), extracting and classifying the characteristics. The first stage is to preprocess the CXR data. The second stage extracts feature from the input photos using Gaussian filters, morphological operations, and edge detection. The collected characteristics are then classified in the third stage using an appropriate classifier, such as a support vector machine (SVM), random forest (RF) method, or neural network. In the years that followed, Machine Learning (ML) and Deep Learning (DL) accelerated their growth. Compared to ML, DL requires more data; however, it is more humanlike and able to predict by itself without much human intervention [9].
One of the DL models, Artificial Neural Networks (ANN), make a comeback for their name in the 2010s. Even though it was first created in 1943, ANN that Google creates can still accomplish amazing results by beating the Go board game world champion [9]. This event is recorded as one of the most notable performances in artificial intelligence and expedited AI progress in research, especially ANN and DL, which are now used for various things, such as processing sensor data, recognizing images, speech, and many more. Unsurprisingly, AI, particularly deep learning, and its use in medicine are advancing quickly in recent years, owing to CNNs' capacity to identify pictures correctly. Nowadays, artificial intelligence is employed in a variety of medical specialties, and one example is radiology. AI and machine learning are generally utilized in domains that need a huge collection of medical data, typically in the form of digital pictures, that can be used to train models later. The application of artificial intelligence in support systems for medical image processing has been recommended to increase reporting accuracy, consistency, and time efficiency [9]. Therefore, in this systematic literature review, we aimed to evaluate various methods and datasets connected to Convolutional Neural Network using Chest X-ray images.

A. Review Method
Digital libraries can be used to do preliminary searches for primary studies, but this is insufficient for a comprehensive assessment. Other sources of evidence must be looked into, such as journals and conference papers [10]. We can use certain keywords that correlate with the title or content. The search string is "("pneumonia" OR "lung diseases") AND ("X-rays" OR "CXR" OR "radiography") AND ("deep learning" OR "convolutional neural network" OR "CNN" OR "classification")". Based on the primary studies collected, we will create the research questions that drive the entire systematic review methodology [10]. To ease the process of data extraction and data analysis, we created the following research questions:  RQ1: Which are the most used Chest X-Ray Image dataset for pneumonia detection?  RQ2: What data preprocessing methods are commonly used for pneumonia detection?  RQ3: What combination of techniques improves the performance of a Deep Convolutional Neural Network? To filter the primary studies that have been collected before, we created an inclusion and exclusion criteria table, as shown in Table 1, that is relevant to our research objective.

B. Review Analysis
After collecting the papers, we assess the eligible papers based on our research questions. Then, we compare the methodologies utilized to evaluate the advantages and disadvantages. The following is our analysis of the publications we read.

1) RQ1: Which is the Most used Chest X-Ray Image Dataset for Pneumonia Detection?
Based on Table 1, it can be concluded that from the paper we have reviewed before, Kermany's Chest X-Ray Images dataset [26] is the most used dataset, with 15 papers using the dataset. Then the second most used dataset is the Kaggle RSNA Pneumonia Detection dataset, with five papers that use the dataset. The third most used dataset is Chest X-Ray14 (NIH) dataset, with three papers that use the dataset, and the fourth most used dataset is the JSRT-complied dataset, with two papers that use the dataset. Below are the details for the four most-used datasets, as shown in Table 2.  [26] studied "Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification," which contains around 1.2 GB of data. There are three files containing the test data, train data, and validation data, which have two subfolders for each image category. There are 5,863 X-Ray images (JPEG) classified as normal, bacterial pneumonia, and viral pneumonia. Wang et al [6] made this dataset freely available on Kaggle. It comprises 112,120 X-ray images of the frontal chest taken from 30,805 people. Each radiography image in the collection is associated with one or more of fourteen different thoracic diseases. Natural Language Processing (NLP) generated these labels by text-mining disease categorization from the related radiological reports. They are thought to be more than 90% accurate. Apart from being referred to as ChestX-ray14 [31], [33], this dataset is sometimes referred to as the National Institute of Health (NIH) dataset. [30], [34], since this dataset is available from the National Institutes of Health. Due to the dataset's vastness, it frequently contains subsets. For instance, have a look at the RSNA pneumonia detection dataset [29]- [31]. The training set consists of about 25,684 chest ×-ray images, and the test set has 1,000 images, each with a resolution of 1024 x 1024. The photographs are grouped according to the conclusion of their diagnosis. It's worth mentioning that the dataset includes a substantially greater proportion of pneumonia-negative images, and this dataset is the subset of ChestX-ray8 dataset [6].   [41].
The MC Dataset was compiled in collaboration with the Department of Health and Human Services in Montgomery County, Maryland, USA. There are 138 PNG frontal X-ray images in total, including 58 TB manifestations and 80 normal cases. It also includes information such as the patient's gender, age, and any lung anomalies. It contained 63 data from males, 74 from females, and one unknown gender. The resolution is either 4892 x 4020 or 4020 x 4892 pixels. However, this dataset does not contain any Pneumonia samples; hence, it is not relevant for Pneumonia detection [42].

2) RQ2: What data preprocessing methods are commonly used for pneumonia detection?
In deep learning, some preprocessing methods must be inserted into the model. All these data preprocessing combinations are believed to improve the model performance and generalize the CNN to the real dirty data (See Table 3). A result mentioned that preprocessing is able to improve recognition accuracy. This means the model will be able to focus on the features of each image, increasing the weight on the specific part which is infected by pneumonia. Below is some explanation about the common preprocessing different researchers use. Image Preprocessing + Data Sorting 1  Data Augmentation The term "data augmentation" refers to a collection of strategies for artificially increasing the volume of data by producing additional data points from current data. This may be accomplished by making minor adjustments to existing data or by employing deep learning models to produce new data points. It is projected that combining data augmentation technology and deep learning algorithms would improve detection accuracy. This preprocessing algorithm performs vertical and horizontal flips, random degree rotation, random brightness, gamma adjustment, random Gaussian noise, and blur on the positive sample image. With adding supplementary regularization and generalization, this specific improvement method may be used to strengthen the training process of our network.
Data augmentation techniques are widely utilized to improve the performance of deep learning systems. The purpose of data augmentation is to increase the performance of the convolutional neural network model that is used to classify chest x-rays [28]. Images are usually resized before they get augmented. Other than generalizing the model, it can also help prevent the model from overfitting training data since the real-world CXR images are dirty and still limited.
Here are some of the most common data augmentation techniques used in general computer vision classification, which can be classified into position and color augmentation. Some common data augmentation techniques that are used are scaling, geometric transformations, flipping, shearing, and rotation.
 Image Preprocessing There are several forms of image preprocessing, but in general, the goal of image preprocessing is to improve the clarity of the image or to rescale it [13], [14], [33], [37], [15], [16], [18], [19], [21], [23]- [25] so that the model can accept it and projected to increase training efficacy and have a good influence on the prediction outcomes. When working with Chest X-Ray radiograph, A low-quality radiograph gives less information on the patient's condition and might be the greatest obstacle to an accurate interpretation [31]. Therefore, Filters [35] or image enhancement [29], [31] can be used to improve the photographs, and this enhancement can enrich the network with the first-stage data set, where the distribution of the first-stage training process serves as an extra regularization and generalization technique [29]. Additionally, there are approaches such as cropping photographs to eliminate superfluous data [9]. For some cases in pneumonia detection, lung parts in the CXR have been segmented away [36] so that it throws away all the noise in the image. Finally, the model can focus on extracting the features directly from the chest x-ray images. Filters such as the Sobel filter and the Laplacians filter can be used to process the image. Those two filters are known as edge detection filters that remove the dimension of the information of the pictures by using contour extraction and then emphasize the feature quantity [35].  Data Sorting The data sorting algorithm classifies photos into low-and high-quality categories. The subjective quality of a photograph governs the resources and processing required for accurate analysis and so acts as a major initial impediment to an expedited workflow. One way to reduce the time spent on corrective actions is to create sorting automation while utilizing a simple convolutional neural network and some images already presorted [31].

3) RQ3: What combination of techniques improves the performance of various Convolutional Neural Network?
From Table 4, it is apparent that researchers use Transfer Learning in their method. Many researchers have used transfer learning to get certain layers from some pre-trained neural networks to be used on other datasets, which are proven to reduce the time needed and improve the neural network's performance. This idea lies in the fact that we can utilize the information from one problem to be used for other problems that are still related [17]. Through transfer learning, creating a more accurate model on small datasets seem possible by using the features from neural networks that have been trained. CNN with Data augmentation + resizing Accuracy: 88.90% [20] CNN with Data augmentation and without data preprocessing Accuracy With augmentation: 83.38%, Accuracy without augmentation: 80.25% [21] Image pre-processing + 3-layer CNN 88.68% accuracy [22] Image pre-processing + VGG16 + modified multilayer perceptron Based on the information, with the limited amount of CXR images available, transfer learning is a preferable method to develop an accurate pneumonia detection classifier. Overall, most of the techniques use data augmentation, and image preprocessing as the preprocessing method and are combined with CNN. It is evident that the combination of techniques that utilize the preprocessing method overall got a higher score than those without the preprocessing method.

III. RESULTS AND DISCUSSION
This section shows a comparison and examines the current methodologies and datasets used to identify pneumonia from CXRs. The examination focuses mostly on guaranteeing durability and utility.
The reference for the authors, methods that are used in the papers, dataset, and performance, followed by the advantages and challenges, are presented in Table 5. Numerous architectures are used, and they are based on the architectures such as DenseNet201, ResNet-18, and VGG16. Furthermore, the preprocessing methods are also varied, for example, data augmentation, image preprocessing, and data sorting. In addition, there are also some papers that do not use any preprocessing method.
In terms of accuracy, a considerable number of papers achieve more than 90% in accuracy, while some use different metrics, and some score badly. Nonetheless, variations of datasets are used; therefore, the performance cannot be ranked easily.

Reference Methods
Challenges [11] Basic CNN Architecture with Data augmentation Since the author uses a different approach from transfer-based learning or a more complex model, they encountered challenges in finding a suitable classification architecture. DCNN ResNet-50 without preprocessing The CNNs model does not perform well with external data compared with internal data [12] Inception ResNetV2, Xception, DenseNet201, and VGG19 with Fine Tuning and Data Augmentation The author did some experiments with a different method, such as without data augmentation, with data augmentation, and finetuning. Each of these approaches has its own advantage and disadvantage. When using fine-tuning, it can achieve higher accuracy compared to without doing any data augmentation. However, fine-tuning has a slower training speed compared to the other approaches.

[18]
Basic CNN and Transfer Learning model (VGG16, VGG19, and Inception V3) with Data augmentation + resizing The performance of the model is very high with the dataset used by the author. Although the performance is very good, the model can be justified by testing using different datasets to see the model's performance.

[19]
CNN with Data augmentation + resizing The performance almost reached 90%; however, the model is likely to overfit due to the small size of the training dataset, which can affect the model performance.

[20]
CNN with Data augmentation and without data preprocessing The performance of both models, with or without data augmentation, outputs a similar result.

[37]
Transfer Learning model (VGG16) with Data augmentation + resizing Although the AUC scores around 96%, the accuracy could be improved since it only reaches 50%.

[27]
Residual Network and Mask Regional CNN without preprocessing Due to the low-quality image, the mask-RCNN did not perform well since the pneumonia features are hard to obtain. Because of the unbalanced data between pneumonia and nonpneumonia can affect each of the model's performance.

[28]
CNN with data augmentation The model performance can be improved.

[29]
Data augmentation + image enhancement + ensemble of RetinaNet and Mask R-CNN The detection performance is still inadequate because the data is small and the pneumonia position is tiny, making the detection difficult.

[30]
Data augmentation + weighted voting + ensemble of Mask R-CNN and RetinaNet The model performance can be improved.

[35]
Image pre-processing + 3-middle-layer CNN Lack of data, a small number of data used. The performance was not high, and the range of uses was limited.

[39]
Data augmentation + 4-layer CNN + AlexNet Lack of data, a small number of data used (100 on each class).
Currently not reliable and has not met the standard of the medical industry.

[33]
Image pre-processing + DenseNet (feature extractor) + SVM (classifier) Only used frontal chest X-rays. Very high computational power because a lot of convolutional layers are used.

[21]
Image pre-processing + 3-layer CNN Creating the model is time and cost-consuming, while additional parameters such as blood supply in oxygen and historical data of the patient must be considered to detect the pneumonia [22] Image pre-processing + VGG16 + modified multilayer perceptron Limited dataset for generalization purposes.
[23] Augmentation + Customized CNN Creating self customed Deep Convolutional Neural Network costs too much time and might cause overfitting.

[24]
Augmentation + with Dropout Layer Self-proposed CNN architectures are prone to overfitting, and the data is still insufficient.

[25]
Offline Augmentation + REsNet50 Architecture adjusted with additional layers The proposed architecture is prone to overfitting if it is trained on higher epochs and has to add additional preprocessing methods, such as regularized and dropout functions [13] Feature Extraction of InceptionV3 CNN Architecture + Machine Learning Classifier (SVM) Most papers are only able to focus on the feature extraction rather than the performance of each classifier. In addition, machine learning algorithms still cannot beat Neural networks in terms of sensitivity.

[14]
Data Augmentation + 5 Ensembled Feature Extraction (Transfer Learning) The computation is expensive to train five great transfer learning algorithms, and this method offers zero to no explanation for how the classification decision is made.

[15]
ResNet50 + Feature Vector + Dual Added Layer A large variation of results shows concerns about the CNN's ability to generalize with the world pneumonia lungs data. Further investigation into the generalizability is needed.

[16]
Feature Extraction of AlexNet + SVM Classifier It only represents two transfer learning algorithms and cannot cover general transfer learning.

[17]
Feature Extraction of Trained Model with ResNet50 Network Hard to generalize two different datasets as one since the first one covers only normal pneumonia, and the second one covers the newest COVID-19 pneumonia

[36]
Data Augmentation + Image Segmentation + AlexNet Data between the normal and pneumonia lungs are not balanced, hence have to generate or do additional data augmentation to create a generalized model Table 5 above represents the various methods used by different papers from different institutions. Some might use the same dataset and produce different model performances. In addition, the metrics are also distinct varying in accuracy, AUC, recall, precision, sensitivity, specificity, and also F1 score. The factor that impacts the model performances mostly are datasets. The most used datasets are Kermany's Dataset, which contains around 5856 datasets for both training and validation, while the next came from ChestX-ray14 Dataset with 112,120 X-Ray Images.
In addition, the most frequently used method for data preprocessing is data augmentations. Several studies stated that it could help improve the Deep Convolutional Neural Network (DCNN) model to be more generalized and robust to outliers and noisy data. Besides that, the algorithm or architecture of the neural networks also directly affects the final performance. The ResNet, DenseNet, and VGG architectural neural network are commonly used as a method for transfer learning to improve the model's performance and reduce the long waiting time in the training process.
It is also apparent from Figure 3. that there are still many challenges that researchers face. The first problem can be related to the insufficient amounts of datasets. Small amounts of images that are used to train a DCNN, might cause several problems. Some researchers had only received frontal chest x-ray images for the training data [32], [33], causing the model to not be able to generalize to the real-world images [15], [22]. Some others also had their model overfit to the training data [14], [18]- [20], [25]. With small-sized images and unbalanced classes [27], [36], they were not able to create a good model, even when data augmentations had already been done. In some cases, parameter tuning did not improve model performances effectively [20]. Lastly, some datasets contain low-quality images [27]. This challenge can't even be solved just by data augmentation, and hence output a bad-performing neural network, with the model getting too underfitted. While many researchers had their model overfitted, some of them also had a really bad model performance which can still be improved in some cases [28], [30], [31], [37].
The next general one is the time cost. Most researchers who used self-created or transfer learning model [11], [16] used most of their time either creating the neural network architecture or in the training process [12], [14], [21], [24], [33]. While these processes are time-consuming, there is no other way since they want to create a generalized and goodperforming neural network. Also, high computational powers are needed in order to process thousands of images with a total of ten-thousands per image.

IV. CONCLUSION
Overall, we succeeded in reviewing up to 27 technical papers related to pneumonia classification using Deep Convolutional Neural Network (DCNN). We focused on finding the use of datasets of different researchers and found the most frequently used dataset is the one provided by Kermany et al. [26]. Although it is used frequently, some of the researchers still mentioned the problem of not having sufficient amounts of datasets. Further research can be done to achieve additional data, either by gathering, generating, or integrating different sources of data. Next, we also studied the use of data augmentations and preprocessing methods, which turned out to be successful in increasing the DCNN model performances. From this statement, we suggest that all researchers should consider data augmentations as their preprocessing methods to save time in finding the best preprocessing options. Lastly, we covered that the best method to create the DCNN architecture is the transfer learning, which accounts for up to 99.4% accuracy. This can be done through feature extractions of different pre-trained models. With deep learning algorithms improving from time to time, there is no doubt that there will be new models with better architectures and performances. Therefore, our recommendation for other researchers is to make some improvements by keep finding new combinations of transfer learning models from pre-trained DCNN to get an even better result.