ON INFORMATICS

.


I. INTRODUCTION
Internet of Things (IoT) is one of the new trends in the technology world.Simply put, IoT connects physical devices such as CCTV, lights, televisions, refrigerators, and even house doors to the Internet continuously and can be controlled remotely via a smartphone [1], monitor it [2], [3], or issue information to other devices [4], [5].Recently, many reports of attacks on IoT device vulnerabilities have been reported [6]- [8].However, due to its rapid development, a problem emerged that harmed IoT devices, one of which was a DDOS attack of the Mirai malware [9]- [11].The Mirai botnet exploits current IoT device firmware vulnerabilities in the market to turn them into a network of remotely controlled bots.After being infected, Mirai IoT devices scan the network for other vulnerable devices, focusing on internet devices like IP cameras and home routers.Along with the development of DDOS attacks [12] on IoT devices that have become increasingly varied, research is needed that examines the characteristics of an attack carried out on an IoT device [13], one of which is in this study which classifies the Mirai malware attack to know the characteristics of the attack so that it can be used as an early warning system parameter.
Mirai malware attack is a malicious program [14], one of the most dangerous malwares in recent years is Mirai malware.It has even been used for the most significant DDoS attack ever recorded [15].DDoS attacks using the Mirai botnet launched by IoT devices tend to be large and annoying [16], so addressing the Mirai botnet threat is a pressing issue.
RapidMiner learning machine is one solution to create a mechanism for detecting and identifying attacks [17].In addition, this machine learning provides data identification functions in IoT networks [18].
This study uses a public dataset from the UCI Repository.The data tested is only the Mirai malware attack on the Internet of IoT devices, a type of security camera.The BaIoT N dataset is collected from raw network traffic data in packet capture format.Each security camera device has six datasets, and one Benign dataset is traffic data when regular traffic while the other five file types are attack traffic, namely SCAN, ACK, SYN, UDP, and UDPplain [19].
In research by Čolaković and Hadžialić [13], the classification process was carried out manually, not by machine learning, so e process became less effective and inefficient.A striking difference was also found in the data set used.The data processing is very different from the data in the study of Meidan et al. [19] in the form of packet capture on each IoT device.However, research by Čolaković and Hadžialić [13] has the advantage that the information data obtained is more complete than research by Meidan et al. [19].This research's primary purpose is to apply the K-Nearest Neighbor algorithm in identifying Mirai botnet malware attacks, including Scan, ACK, SYN, UDP, and UDPlain, on IoT devices with security camera types.In the future, the classification results can be used as reference data for an Early warning system (EWS) on an IoT device to identify and prevent Mirai malware attacks.

A. Related Works
Research by Čolaković and Hadžialić [13] that has been done before is a study by performing direct calculations using the K-nearest neighbor algorithm formula in detecting botnet traffic using the CTU-13 dataset.The algorithm in the study of Čolaković and Hadžialić [13] was used to detect the Mirai malware attack anomaly.In contrast to the research by Meidan et al. [19], which was used to detect the characteristics of the attack as an early warning system [13], using eight types of botnets/malwares (Zeus, Conficker, Dridex, Necurs, Miuref, Bunitu, Upatre, and Trickbot.Research by Čolaković and Hadžialić [13] has the advantage that the results can be measured at the level of accuracy due to manual calculations.
Research by Čolaković and Hadžialić [13] and Meidan et al. [19] employed the same public datasets, namely from the UCI library.However, the attacks and the data content are different.For example, research by Meidan et al. [19] uses a dataset from network traffic logs, while the research that will do the dataset is in the form of packet capture logs of the Mirai malware attack on the Internet of Things device architecture, a security camera.

B. Method
The method used in implementing this algorithm uses four stages in Figure 1: literature study, data collection, process data, and modeling.

1) Study of literature:
This literature study's stage describes the theory, findings, and other research materials obtained from international journals and national journals [20].This literature study will be used as the basis for research activities in developing a clear frame of mind from the formulation of the problem to be studied.The literature study used is a journal on malware attack analysis and machine learning.
2) Data Collection: at this stage, data collection is collected from various sources [21].The sources used are only public datasets from the UCI Repository.All information in the dataset is collected and organized by function and type.The process of collecting data in the study is described in Figure 2.   Datasets: at this stage, collect and determine the data set in which there is already a collection of research source data and instructions for conducting research.
Table 3 shows the number of attacks on the dataset detection of IoT botnet attacks (N-BaIoT) on each IoT device.Each device has six datasets, consisting of one standard traffic dataset (Benign) and five Mirai attack traffic datasets (Scan, ACK, SYN, UDP, and UDPplain).
Packet Stream: this is the sub-data residing in the data set.Eight statistics are extracted from the packet stream in the N BaIoT dataset.The estimated covariance between the two streams Pcc The estimated correlation coefficient between the two streams Table 4 lists the various types of packet flows and their descriptions.This packet flow is contained in the aggregation.The packet flow value is a numeric number converted from raw network traffic.Aggregation Stream: table 5 shows the breakdown of aggregation streams.These five aggregations are the most recent traffic recorded.
Table 7 lists features in the database, which contains a combination of a packet stream, aggregation stream, and time frame.The organization of feature datasets based on package statistics can be seen in table 8.The organization of these features has four groups.3) Data Process: Data Processing is the second stage in the research, namely processing large amounts of data and unbalanced data into datasets that can be used for testing.Figure 3 shows the stages of data processing, the beginning of entering the regular traffic and traffic attack datasets.The combination produces an unbalanced dataset.Then the dataset is sampled, so the combined traffic and traffic attack datasets become balanced.However, the dimensionality of the dataset is still high.So, the features in the dataset are chosen so that the dataset becomes of low dimensionality and the level of accuracy becomes optimal.The stages of data processing use the Rapidminer application to perform data processing.

A. Identification Scenario Process
The identification process involves 2 data in each data device, regular traffic (Benign) and attack traffic (ACK, SYN, UDP, UDPplain) because after testing all devices, the test results produced the same results in every type of attack on the device.So IoT then, to save time, each IoT device carries out testing against one attack.
Table 9 is a scenario of each device's identification process against the attack type; the scan data cannot be identified because it is not DDoS attack data.Instead, the scan data is only traffic data for weaknesses on IoT devices.

B. Modeling 1) Provision PT-737E device modeling (benign & SYN attack):
Device modeling aims to make the two processed data (benign & SYN attack) into balanced data to identify them.Figure 4 is a model for selecting five parameters with the highest activity to identify syn attack-type Mirai attacks.Table 10 selects features that produce the three highest activity parameters for the Host-MAC&IP category and the two highest activity parameters for Host-IP.The five highest activity parameters have three different periods, 1.5 seconds, 500 milliseconds, and 100 milliseconds.Packet flow on the five highest activity parameters also produces 1 type, namely Weight.The five selected parameters can be interpreted as a network traffic condition of an IoT device that is attacked by the DDOS Mirai botnet.For example, if a network device is in a condition such as the five highest activity parameters selected, it can be interpreted that a DDOS Mirai syn attack has attacked the device.2) Provision PT-838 device modeling (benign & ACK): Two data are processed, namely benign & ACK, to become balanced data to identify it.The feature selection model is shown in Figure 5.

Features Description H_L0.01_Variance
Host IP 100ms H_L0.1_Mean Host IP 500ms H_L0.1 Weight Host IP 500ms MI_dir_L0.01_VarianceHost MAC&IP 100ms MI_dir_L0.1_Weight Host MAC&IP 500ms Table 11 is a selection feature that produces the two highest activity parameters for the Host-MAC&IP category and the three features Host-IP.The five highest activity parameters have two different periods, 500 milliseconds and 100 milliseconds.The packet flow on the five highest activity parameters also produces three types, namely Weight, mean, and variance.The five highest activity parameters selected can be interpreted as a network traffic condition of an IoT device attacked by a DDOS Mirai botnet ack attack.If a network device is in a condition such as the five highest activity parameters selected, a Mirai DDOS botnet ack attack can be interpreted as an attack on the device.
3) Simple Home XCS7-1002-WHT (benign & UDP attack) device modeling: Device modeling aims to make the two processed data (benign & UDP attack) into balanced data to identify them.12.

Features Description H_L0.1_Variance
Host IP 500ms H_L0.1_Mean Host IP 500ms MI_dir_L0.1_Variance Host MAC&IP 500ms MI_dir_L0_1_Mean Host MAC&IP 500ms MI_dir_L0.1_WeightMAC&IP 500ms Table 12 is a selection feature that produces the two highest activity parameters for the Host-IP category and the 3 for the Host MAC&IP category.The five highest activity parameters have one time period, namely 500 milliseconds.The packet flow on the five highest activity parameters also produces three types, namely Weight, mean, and variance.The five highest activity parameters selected can be interpreted as a network traffic condition of an IoT device attacked by the DDOS Mirai botnet UDP attack.If a network device is in a condition such as the five highest activity parameters selected, it can be interpreted as being attacked by the DDOS Mirai botnet UDP attack.

4) Simple Home XCS7-1003-WHT (benign & UDPplain attack) device modeling:
The modeling of the processed device is benign & the UDPplain is seen in Figure 7.

Features Description H_L0.01_Weight
Host IP 100ms H_L0.01_Weight Host IP 100ms H_L1_Weight Host IP 1,5s MI_dir_L0.01_WeightHost MAC&IP 100ms MI_dir_L0.1_Weight Host MAC&IP 500ms Table 13 is a selection feature that produces one parameter of the highest activity in the Host-MAC & IP category.The five highest activity parameters have three time periods, 500 milliseconds, 100 milliseconds, and 1.5 seconds.Packet flow on the five highest activity parameters also produces 1 type, namely Weight.The five highest activity parameters selected can be interpreted as a network traffic condition of an IoT device attacked by the DDOS Mirai botnet.

C. Overall Classification Results
Table 14 is the result of classifying the five highest activity parameters as device parameters when exposed to DDOS attacks.The selection of the highest activity parameter can be used for the Early warning system on a device because it can be used as a parameter for the condition of the device being attacked by DDOS or not.So that prevention and control can be carried out optimally.IV.CONCLUSION Based on the test results, the K-Nearest Neighbor algorithm has successfully classified DDOS attacks from all types of attacks, namely SYN, ACK, UDP, and UDPplain.Furthermore, all test results on these IoT devices have the same characteristics when tested with several DDOS attacks.This proves that the identification of the Mirai malware has been successfully carried out so that further development of the parameters obtained can be used for the Early Warning System for detecting the Mirai botnet malware in the IoT environment.

Fig. 2
Fig. 2 Data Collection Stages  IoT Devices: At this stage, collect and determine the Internet of things devices selected for research.

Fig. 3
Fig. 3 Data Processing Stage 4) Modeling: This part is explained in next section.

Fig. 6
Fig. 6 Modeling Selection Feature Fig. 6 is a model for selecting the five highest activity parameters to identify DDOS attacks.The results can be seen in Table12.

Table 1 is
-Wireless support 802.11b/g/n -Port 80 UDP -Camera Quality 1MP(720p) -Kode Pro 7 Provision PT-838 -Wireless support 802.11b/g/n -Port 80 UDP -Camera Quality 2MP(1080p) -Kode Pro 8 a list of IoT devices infected with the Mirai botnet.Four IoT devices are infected with the Mirai Botnet in the N-BaIoT dataset.Type of Attack: at this stage, collects and determines the type of DDOS attack be investigated.Mirai attack is the choice chosen for research.

TABLE II TYPE
OF ATTACK

Table 2
is a type of Mirai botnet attack launched on IoT devices.The four types of Mirai botnet attacks on how they work are flooding the IoT device server, and the remaining type of attack is scanning vulnerable IoT devices automatically.
This table 6 is the time frame contained in the features in the dataset.The five-times time frame is used to detect the Mirai malware in real time.
Features: the features contained in each dataset this feature consists of 23 main features and five frames (1 minute, 10 seconds, 1.5 seconds, 500 milliseconds, and 100 milliseconds).The number of features in each dataset is 115 features.

TABLE XI ACK
ATTACK CLASSIFICATION RESULTS

TABLE XIII UDP
PLAIN ATTACK CLASSIFICATION RESULTS