ON INFORMATICS VISUALIZATION

— In the last decade, the number of attacks on the internet has grown significantly, and the types of attacks vary widely. This causes huge financial losses in various institutions such as the private and government sectors. One of the efforts to deal with this problem is by early detection of attacks, often called IDS (instruction detection system). The intrusion detection system was deactivated. An Intrusion Detection System (IDS) is a hardware or software mechanism that monitors the Internet for malicious attacks. It can scan the internetwork for potentially dangerous behavior or security threats. IDS is responsible for maintaining network activity under the Network-Based Intrusion Detection System (NIDS) or Host-Based Intrusion Detection System (HIDS). IDS works by comparing known normal network activity signatures with attack activity signatures. In this research, a dimensional reduction and feature selection mechanism called Stack Denoising Auto Encoder (SDAE) succeeded in increasing the effectiveness of Naive Bayes, KNN, Decision Tree, and SVM. The researchers evaluated the performance using evaluation metrics with a confusion matrix, accuracy, recall, and F1-score. Compared with the results of previous works in the IDS field, our model increased the effectiveness to more than 2% in NSL-KDD Dataset, including in binary class and multi-class evaluation methods. Moreover, using SDAE also improved traditional machine learning with modern deep learning such as Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). In the future, it is possible to integrate SDAE with a deep learning model to enhance the effectiveness of IDS detection.


I. INTRODUCTION
The number of internet users has increased significantly over the last decade. Additionally, advancements in technology, particularly in the internet, communication, and networking, have resulted in a massive amount of data being generated from a variety of sources, including industry, ecommerce portals, messengers, social media, and healthcare. This massive amount of data is referred to as big data and has four characteristics: high veracity, high velocity, wide variety, and high value. Since the advent of big data, the number of attacks has also increased. In 2019, the internet had been connected to more than 26 billion devices. Additionally, it contributes to the growth of malicious activity on the internet. Intrusion Detection System (IDS) has evolved into a critical tool for enhancing network and computer system security [1], [2].
Numerous experts, researchers, and academicians use conventional machine learning mechanisms to improve IDS, including Neural Networks (NN), Support Vector Machines (SVM), K Nearest Neighbors (KNN), Decision Tree 3 (DS3), Multi-Layer Perceptron (MLP), and Auto Encoder (AE). The involvement of conventional shallow learning frameworks (one feedforward network) is ineffective in resolving the autodetection problem for big data. They consistently fail to detect activity attacks, accurately capture attack information, and resolve noise in massive datasets [3], [4]. In response to the issue above, deep learning models such as a deep Auto Encoder (AE), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Gated Recurrent Unit (GRU), and Long Short-Term Memory (LSTM) have become increasingly popular in recent years. The illustration of IDS detection is shown in Fig. 1 [5].
Additionally, the total number of attributes extracted from the internet data that IDS must observe is always enormous, even in small-scale capacity networks. Indeed, the majority of raw data is superfluous and noisy. As a result, the classifier's performance is degraded by the presence of unsuitable features. As a result, it is critical to employ multidimensional reduction frameworks such as the Principal Component Analysis (PCA), Mutual Information (MI), Chi-square, and UMAP [6]. Unlike the previous works, our experiment adopted SDAE to enhance dimensional reduction. The detailed experiment scenario is shown in Fig. 2.
In this study, the researchers developed a novel dimensional reduction model based on SDAE, focusing on four aspects, including 1) the hybridization between SDAE and KNN, 2) the hybridization between SDAE and Naive Bayes, 3) the hybridization between SDAE and SVM, and 4) the hybridization between SDAE and decision tree. We have applied the proposed model mentioned above to the NSL-KDD dataset. Many previous works state that the intrusion detection model has three main methods: deep learning, conventional machine learning, and pattern similarity. Deep learning has become the most popular method in the last few years. In the beginning, pattern similarity models were mostly used to detect intrusions. Most of them use patterns similar to their main core learning algorithm, and they use attribute similarity to do this [7], [8]. Most of the frameworks have already been used for implementation in the past. Knuth Morris Pratt (KMP), Boyer Moore (BM), Boyer Moore Harspool (BMH), Boyer Moore Harspool Sunday (BMHS), Aho-Corasiek (AC), and AC-BM were some of the traditional models that were used to make an Intrusion Detection System. Following the experiments' results, it was found that an algorithm worked well to speed up the performance of pattern similarity calculations and cut down on the amount of time it took to do them. However, the traditional pattern similarity model has a big problem. They cannot figure out how intrusion detection works. The discovery of a low-cost algorithm that can cut down on the amount of time it takes, and the value of false positives has become the main point of this study. When machines become more intelligent, there is still a new study that is worth reading.
Denning [9] was the first to propose IDS machine intelligence, and his study used a multi-algorithm model to detect intrusion detection activity. According to the expert hypothesis, the model created a pattern of several features by hand. First, a modern machine learning model based on SVM was created [10]. The experiment configured KDD99 datasets, resulting in 3 features with an accuracy of 91%, 36 features with an accuracy of 99%, and 41 features with an accuracy of 99%.
A study employing traditional machine learning and KNN improved an early model. This model included a K-mean clustering and a KNN classifier [12]. This model evolved into the state-of-the-art IDS intelligence machine for malicious detection known as CANN. Another study proposes the use of a traditional classifier with Random Forest to improve CANN [11]. The hybrid model, which used Random Forest as a core classifier machine, achieved an accuracy of 94.7%. A Random Forest (RF) enhancement using an Artificial Neural Network (ANN) was proposed [12]. When applied to NSL-KDD, the ANN model produced more than 81% of accuracy and 79% classification for malicious detection and network attack classification. A Decision Tree (DT) intrusion detection model based on NSL-KDD was proposed [13]. According to the experiment results, DT successfully achieved effectiveness in the IDS detection classification task. According to the explanation given above, the enhancement of traditional machine learning achieves astounding effectiveness in IDS detection. However, most of them required large-scale pre-processing and complex attribute extraction. It is impossible to handle significant intrusion data when using a machine learning classification method.
Deep learning, a new type of neural network with a very complex network structure, was introduced in the early decade. Deep learning had achieved tremendous performance in the image processing classification task. Furthermore, deep learning has become the industry standard for dealing with a variety of computer science-related problems such as image processing, voice recognition, text mining, recommender system [14], [15], [16], [17], [18], [19], matrix factorization enhancement for recommender system [20]. Recommender system based on location for transportation service [21], product document representation to enhance collaborative filtering based on matrix factorization [22], [23], [24], CNN for document context for recommender system [25].
A deep learning model based on Auto Encoder was proposed [26], using NSL-KDD to investigate the self-taught learning model (STL). The model is made up of two fundamental process classifications. The first step in the compact attribute representation process is to train a dataset with unlabeled data, and the second process is to train the learning representation features with labeled data and implement the classification of IDS tasks. The experiment used STL in two, five, and twenty-three classes. According to the results, STL achieved an accuracy of 88.39%, while the 5class classification achieved an accuracy of 79.10%. A deep learning model was based on the combination of Deep Belief Networks (DBNs) and probabilistic neural networks [27]. DBN is responsible for converting lowdimensional to non-linear representations while retaining the important characteristics of raw data. They optimize hidden layer learning using particle swarm optimization. Additionally, the Probabilistic Neural Network (PNN) uses final classification techniques for IDS detection. As demonstrated in their experiment, DBN-PNN achieved an accuracy of 93.25%. Additionally, DBN-PNN outperformed previous works that combined Principal Component Analysis (PCA) and Probabilistic Neural Networks (PNN).
A study proposed another deep learning model for the IDS task based on a Deep Belief Network (DBN) [28] [29]. This model incorporates two critical processes: 1) they learned layer by layer using a restricted Boltzmann Machine (RBM), and 2) they derive the hidden layer vector from the visible layer vector. The hidden layer representation is the vector manifest for the following layer. The two processes combine backpropagation networks generated by the final RBM method and use the output vector generated by RBM as an input vector. The DBM model achieves a measurement accuracy of 95.25%. This results in a performance advantage of 89.07% over backpropagation and 91.36% over SVM.
DNN is an acronym for Deep Neural Network, considered suitable for use in IDS networks [30]. The DNN algorithm represents an auto encoder with four hidden layers and one hundred hidden units. They use Rectified Linear Units (ReLU) to activate the hidden layer, and ReLU classifies activation functions that are not linear. This activation function is intended to improve the algorithm's performance when performing complex classification tasks. The adaptive moment mechanism was used in this study to reach the stochastic optimizer. As demonstrated in the experiment, DNN achieved a measurement accuracy of 99%.
A novel model for detecting IDS networks using Convolutional Neural Networks (CNN) has been proposed [31]. The CNN model is well-suited to address a variety of image processing-related issues. In this IDS detection case, the author assumed that the image processing problem is similar to the IDS problem in terms of data vector dimension. CNNs are a subclass of feedforward neural networks that employ convolutional processes to condense large amounts of dimensional data into representative vectors. This work, which employs a CNN model, asserts that the model successfully improved the imbalanced dataset and that the model not only reduced the false alarm rate but was also useful in enhancing the class's accuracy even when the sample size was small. As their experiment report indicates, CNN achieves an accuracy of 79.48% in KDD-NSL. It outperforms several conventional machine learning techniques that have been proposed in previous works.
GAN (Generative Adversarial Network) and AE were used on NSL-KDD, a novel IDS detection model [32]. When they applied a semi-supervised model, they reduced the time and effort required to manually label the labeled data and increased the effectiveness of IDS malicious detection without labeled data. Using GANs and AEs to improve IDS detection on NSL-KDD datasets, even with only 0.1% of the datasets that had labeled data, was a successful experiment report.
The Long Short-Term Memory (LSTM) is a subclass of feedforward neural networks with sequential aspect mechanisms [33], [34]. It is a recurrent neural network enhancement. This year, LSTM is being considered a possible model for an IDS network, such as the so-called DL-IDS [26]. DL-IDS has an accuracy rate of 98.67%, according to an experiment on Hybrid PCA/LSTM [35]. PCA is responsible for reducing raw data attack dimensions, while LSTM is tasked with classifying network attacks. They report that PCA-LSTM achieves 99.45% accuracy in binary class and 99.39% accuracy in multiclass. LSTM performance was improved by reducing the number of dimensions in the PCA model. They also proposed mutual information (MI) and LSTM in their research. It has a 96.24% binary class accuracy and a 95.56% multi-class classification accuracy.

II. MATERIAL AND METHOD
This study considers using NSL-KDD datasets to assess the efficacy of SDAE KNN, SVM, and Decision Tree variants. The datasets are widely used in IDS detection research. The detailed explanation and representative datasets are provided below.

A. NSL-KDD datasets explanation
NSL-KDD is an improved version of the KDD99 datasets. The datasets are widely used in the benchmarking mechanism of many IDS network detection systems. Furthermore, NSL-KDD improves some shortcomings in the original KDD99 datasets, such as the lack of repetition and replication in test and train records, which influences the bias of the classifier function against frequent samples. The dataset was created for free use by the Canadian Cybersecurity Institute [36]. The datasets are divided into training and testing configurations, which are denoted as KDDTrain+ and KDDTest+, respectively, with a total of 125973 training records and 22544 testing records. Begun in the KDDTest+ recognized with additional 17 attack categories, in which it is not integrated into KDDTrain+, the researchers aim to achieve a classification result fairly, and thus removing 3751 categories was considered necessary. Furthermore, the KDDTest+ was 22544 -3751 = 18793. Table  1 shows the detailed characteristics of the KDDTrain+ and KDDTest+. NSL-KDD, including the zf (f=1,2,3,4,5,..41) feature, which includes three symbolic attributes and 38 continuous attributes. The NSL-KDD datasets are divided into four attack class categories, as described below:  Denial of Service (DoS): A DoS attack is when someone tries to make it impossible for people to get to a network service, server, or other services by flooding the internet with a lot of traffic. In a DoS attack, someone else can slow down or shut down a server or network service.  Root to Local (R2L): R2L attacks send remote packets that are not real to a server or computer system to get into the server or computer system without permission.  User to Root (U2R): It is a group of attacks to get into a computer's "root" area. In this example, the hacker finds out the system's flaw and logs in as a normal person.  Probe: It is an attack category that can get information about networks and security management systems without being under the control of anyone. Table 1 summarizes each attack category in detail. This follows the explanation in the previous text.

B. Data Pre-processing
Data pre-processing aims to calculate data into a standard process so it can be properly routed to the next stage section. It also ensures that the machine learning algorithm can recognize the feature characteristic. To achieve the goal, the pre-processing process is divided into three sections: data normalization, outliers data analysis, and dimensional data transformation using one-hot-encoding.

1) Removing outlier:
A value in the NSL-KDD is inconsistent, and Outliers frequently use this term to describe this problem. Before the normalization of the data step, it has an essential procedure. In addition, outliers may impact the proposed model of malicious detection, which could result in incorrect detection. We considered using Median Absolute Deviation Estimator (MADE), a technique whose working mechanism is represented in the following equation: 2) Data normalization: As part of the normalization process, the min-max method is used to calculate the z fj numerical attribute in the range of 0-1 with the following equation: 3) One-hot-encoding: Protocol model, service, and flag are three special feature characteristic attacks that necessitate a specific method of handling (z2, z3, z4). To convert them into a numeric number, the one-hot-encoding method is required. Every categorical feature, in particular, was demonstrated with a binary number. For example, protocol type is represented by three category attributes: udp, icmp, and tcp. The one-hot-encoding is in charge of the transformation into binary vector space, such as (1.0.0), (0.1.0), and (0.2.0). (0.0.1). The conversion process into a one-hot-encoding vector was also used for service and flag features with z3 and z4 symbol representation. The total number of feature attack characteristics in 41 features was computed into 122-dimensional features, which consisted of 84 dimensional features with binary class and 30 continuous values.

4) Dimensional reduction using SDAE:
SDAE is a subclass of auto encoder (AE) neural network, in which the AE takes the input and transforms it into hidden layer representation using a deterministic mechanism, while the denoising autoencoder is in charge of extracting the input's missing representation layer [28]. This model aims to address the auto encoder problem, which is difficult to train in deep learning models in order to detect unsupervised learning processes that map feature inputs into middle process representations. According to the literature, some versions of autoencoders have been proposed and have demonstrated tremendous achievement in the field of computer science research [29]. Furthermore, a class denoising autoencoder can be stacked to compute a deep layer, as seen in high-level classes where it is known as stack denoising autoencoder. SDAE, in particular for the learning mechanism, uses regularization to address the optimization problem.

C. IDS Detection Classifier
This research considered incorporating four traditional classifier algorithms to observe the model's performance. The dimensional reduction using SDAE integrated into Naive Bayes, KNN, Decision Three, and SVM. The basic mechanism of the algorithm is explained below.

1) Naive Bayes:
When dealing with binary (two classes) or multiclass classification problems, the Naive Bayes (NB) algorithm is the go-to choose. Binary or categorical input values make the technique easier to understand. Naive Bayes (also known as idiot Bayes) is a type of probability distribution that is simplified to make the calculation of the probabilities for each hypothesis tractable. To save time, rather than attempting to calculate the values of each attribute value P(1), P(2), and P(3)|h), it is assumed that they are conditionally independent given the target value and the values are calculated as P(d1|h) * P(d2|H) and so on.

2) K-nearest neighborhood (KNN):
It is possible to use KNN, one of the simplest supervised machine learning algorithms, to predict the class of a particular data sample by considering "feature similarity." It calculates its distance from the other samples in the neighborhood to identify a sample. The parameter k in the KNN algorithm can affect the model's performance. At very small k values, the model may be subject to over-fitting problems. The sample instance may be incorrectly categorized if a large number of k values are selected [37], [38], [39].

3) Decision Tree:
A Decision Tree (DS Tree) is a fundamental supervised machine learning algorithm that can be applied to both classification and regression problems on a given dataset (rules). Nodes, branches, and leaves make up the tree-like structure of the model. Each node is a feature or an attribute. Each leaf on the tree represents a possible outcome or classification, while the branch represents a rule or decision. To prevent over-fitting, the decision tree algorithm automatically selects the best features for creating a tree and then performs pruning operations to remove irrelevant branches from the tree. These three decision tree models are the most widely used: CART, C4.5, and ID3 [40], [41].

4) Support Vector Machine (SVM):
Using the SVM, a margin-based classification method, an optimum hyperplane is created that can effectively distinguish between the different classes as much as possible, following the principle of structural risk minimization [28]. As a result, SVM has a powerful generalization capability and is resistant to overfitting issues. Furthermore, SVM can deal with nonlinear classification problems by selecting kernel functions to map the original feature space to some high-dimensional feature spaces with linearly separable instances.

D. Hybrid SDAE with Naive Bayes, KNN, Decision Tree, and SVM
Our study considers implementing SDAE and the popular traditional machine learning approach. It is a very important approach to observe the effectiveness level of several combinations between them. The schematic of the hybridization scheme can be seen in Figure 4 below. Our experiment consists of several evaluation processes, including multi-class and binary-class using confusion matrix, accuracy, recall, F1-measure, and precision. The multi-class experiment consists of 5 possibility conditions categories: normal, DoS, Probe, U2R, and R2L; while the binary class consists of 2 conditions: normal and anomaly. We compared four traditional machine learning models including KNN, Naive Bayes, Decision Tree, and SVM. Then, they would be integrated into dimensional reduction based on SDAE respectively. SDAE is the enhancement of the Auto Encoder model. The advantage of variant Auto Encoder is that it is useful in feature extraction mechanisms. It is also a categorical modern deep machine learning. Our schematic training process divided the NSL-KDD into 30% and 70%. Most researchers in IDS detection have conducted this schematic training ratio.

E. Evaluation Metrics
For example, TP represents the true positive rate, which indicates the number of abnormal samples that tested positive (accurate detection). TN represents the true negative rate, indicating the number of normal samples tested negative (accurate detection). FP represents the false positive rate, representing how many abnormal samples tested positive (inaccurate detection). While FN represents the false-negative rate, which represents how many abnormal samples tested negative (accurate detection) (incorrect detection).
Accuracy is defined as the ratio of correctly classified samples to all samples in the testing set, expressed in percentage. Precision is the ratio of correctly classified samples to the total number of TP and FP samples in the testing set, expressed in percentage. The recall ratio is the ratio of the number of TP samples to the total number of TP and FN samples. When it comes to the time to compute the F1-score, it is calculated using the weighted average of precision and recall.
III. RESULTS AND ANALYSIS The result of dimensional reduction using SDAE can be seen in Fig. 5 below. The dark colors represent values that are almost like the actual values, while the bright ones represent values that are very different from the actual values. Then, the output from dimensional reduction resulting from SDAE would be integrated into four machine learning categories. The evaluation metrics include accuracy, precision, recall, and F1 as shown in Table 2. The experiment of our model consisted of 2 classes which were multi-class and binary class, in which binary class only detected an anomaly and normal detection, while multi-class involved 5 categories condition including "Normal", "DoS", "Probe", "R2L", and "U2R".
As shown in Table II, the enhancement of dimensional reduction using SDAE succeeded to increase the effectiveness of traditional machine learning in IDS detection. The hybridization between SDAE and KNN model achieved an accuracy of 79.8% compared with KNN without SDAE, which only achieved 77.9%. The hybridization between SDAE and Naive Bayes also achieved better performance over the traditional Naive Bayes without SDAE with tremendous results in 80.5% compared to that of previous work results with 76.3%. Another successful model using a Decision Tree combined with SDAE achieved an accuracy of 83.4%, while the one without SDAE reached an accuracy of 82.9%. Our experiment report shows that SDAE and SVM achieved the best performance in 84.1%, whereas the traditional SVM only achieved an accuracy of 80%.
The multi-class training result shows that the combination of SDAE with 4 machines learning also reached better performance over traditional machine learning. The hybridization among SDAE and KNN reached an accuracy of 78.1%, while KNN without SDAE only achieved 75%. The novel hybridization between SDAE and Naive Bayes achieved better performance in 78.7% over traditional Naive Bayes which only reached 77.8%. Another hybridization model between Decision Tree and SDAE showed better performance in 82.8%. This achievement was 2% higher than the traditional Decision Tree, which only reached 80.1%. The hybridization reached the best achievement in our experiment between SDAE and SVM with an accuracy of 83.3%. It means that SDAE and SVM successfully increased the effectiveness level in IDS detection by more than 3% compared to the traditional SVM that only employed preprocessing process.
Our study also applied a confusion matrix to detect the effectiveness of our model. The confusion matrix was tried in each hybridization model and evaluated based on the multiclass and binary class classification approach. The binary class is shown in Fig. 6 to 13, while the multi-class classification can be seen in Fig. 14 to 21. Fig 6 to 13 demonstrated the involvement of SDAE, showing success in reducing misclass detection in every hybridization scenario, including SDAE with KNN, Naive Bayes, Decision Tree, and SVM. Hybridization between SDAE and KNN could increase accuracy detection by 81% from 79%. The combination between SDAE and Naive Bayes achieved 82.9% while traditional pre-processing and Naive Bayes only reached 81.7%. The combination between SDAE and Decision Tree showed better performance over previous work with KNN and Naive Bayes in which SDAE and Decision Tree reached 85.5% while the traditional Decision Tree and pre-processing only reached 82.1%. Meanwhile, the hybridization between SDAE and SVM has become the best performance with an accuracy of 86.2%. The traditional pre-processing and SVM reached 82.1%. The employment of SDAE proved more effective in every hybridization scenario in multi-class classification. This model is also effective in detecting 9341 normal network traffic with miss class detection in 946, and correct anomaly detection in 7274 with 1704 miss class detection.  The experiment report based on the confusion matrix on multi-class classification is shown in Fig. 14 to 21. Each figure shows that SDAE could reduce miss class detection. The involvement of SDAE supported KNN to enhance the accuracy level in confusion matrix evaluation by 74%, while the traditional KNN and pre-processing only reached 72%. The combination between SDAE and Naive Bayes also successfully increased performance in multi-class IDS detection in which this model achieved an accuracy of 79.9% compared to Naive Bayes and pre-processing, which reached an accuracy of 77.9%. The Decision Tree that applied SDAE also successfully reduced miss classification and increased accuracy in confusion matrix evaluation, which achieved 83.2%, whereas the Decision Tree without SDAE only reached 82%. Another hybridization model involving SDAE and SVM, evaluated using a confusion matrix, reached the best performance over the previous hybridization approach. SDAE-SVM could reduce miss classification, increase accuracy performance by 87%, and achieve an accuracy of 84% in pre-processing and SVM only. The comparison results over the previous state-of-the-art have been conducted in this study. The competitor used several novel methods based on statistical and deep learning approaches, for instance, the hybridization of statistical models with machine learning, the combination between CNN and LSTM, LSTM and Mutual information, and LSTM and PCA. The comparison is shown in Table 3.

IV. CONCLUSION
This present study considers enhancing dimensional reduction using a variant of auto encoder based on SDAE. It is found that this model is useful for improving the traditional machine learning work. SDAE is also suitable for reducing miss classification in traditional machine learning such as KNN, Naive Bayes, Decision Tree, and SVM. SDAE and SVM achieved the best combination in our experiment compared to the other models, such as Decision Tree (the second-best achievement), Naive Bayes, and KNN. SDAE also successfully increased the effectiveness of classification mechanisms in machine learning, especially in IDS detection, even when compared to modern machine learning approaches such as deep learning based on CNN and LSTM in binary and multi-class classification methods.
There are some challenges in future research in that SDAE can be integrated with modern deep learning approaches such as MLP, LSTM, CNN, and GAN to reduce miss class prediction and increase the correct value prediction. Our model that is developed using traditional machine learning is highly possible to be improved with an ensemble learning approach.

ACKNOWLEDGMENT
This study is supported by Universitas Amikom Yogyakarta, Indonesia.