ON INFORMATICS

— Ransomware attacks are also rising in this growing technological advancement world. This threat often affects the finance of individuals, organizations, and financial sectors. To effectively detect and block these ransomware threats, the dynamic analysis strategy was proposed and carried out as the approach of this research. This paper aims to detect ransomware attacks with dynamic analysis and classify the attacks using various machine learning classifiers: Random Forest, Naïve Bayes, J48, Decision Table, and Hoeffding Trees. The TON IoT Datasets from the University of New South Wales (UNSW) were used to capture ransomware attack features on Windows 7. During the experiment, a testbed was configured with numerous virtual Windows 7 machines and a single attacker host to carry out the ransomware attack. Seventy-seven classification features are selected based on the changes before and after the attack. Random Forest and J48 classifiers outperformed other classifiers with the highest accuracy results of 99.74%. The confusion matrix highlights that both Random Forest and J48 classifiers can accurately classify the ransomware attacks with the AUC value of 0.997, respectively. Our experimental result also suggests that dynamic analysis with a machine learning classifier is an effective solution to detect ransomware with an accuracy percentage exceeding 98%.


I. INTRODUCTION
Currently, attackers employ sophisticated methods to create new types of profitable malware. Ransomware is a type of cyber-attack that has recently gained popularity. The goal of this malware is to encrypt user files, restrict access to them, and then demand a ransom for the decryption key [1], [2]. Ransomware has become a serious menace to the computing sector, necessitating fast action to avoid financial and moral extortion. As a result, a new technique to detect and prevent this type of assault is critical. Dynamic analysis, static analysis, and a hybrid system that combined dynamic and static analysis were the three types of detection approaches used [3]- [5]. Most previous detection methods relied on a time-consuming but feasible and effective procedure known as dynamic analysis [6], [7].
Ransomware attacks first appeared in September 2013 using Rivest-Shamir-Adleman (RSA) public-key cryptography. When more than 1,400,000 Kaspersky users across numerous industries were targeted in 2016, it escalated into a catastrophic problem. Since the ransomware "WannaCry" infected over 400,000 machines across 150 countries in just one day in 2017, researchers have concentrated their efforts on ransomware detection. Despite this, the number of ransomware victims among enterprises globally has climbed during 2018, reaching a high of 68.5 percent in 2021 [8]. Figure 1 depicts the global business victimization rate from 2018 to 2021. Over the years, various machine learning techniques and frameworks for ransomware detection have been developed and tested. According to multiple research studies, ransomware detection rates can exceed 96 percent when a machine learning algorithm combined with a dynamic analysis approach is used [9], [10]. The machine learning algorithm can also be used to analyze network traffic for Android malware, with detection rates exceeding 99 percent [11]. Due to this, the focus of this research is on detecting ransomware attacks on a device machine; the following are the primary contributions of this research:  The detection of ransomware with dynamic analysis of before and after the attacks takes place using different types of machine learning classifiers.  The comparison of accuracy between different machine learning classifiers on finding the ransomware attacks on the Windows 7 machine.  The Random Forest and J48 classifiers have proven to be the most accurate implementations in determining a ransomware attack on a machine. Omar et al. [12] discussed that a ransomware-affected Windows 7 machine has distinct network dialogues establishment characteristics. The infected machine will connect to a remote attacker's network address, a commandand-control server, and a payment or distribution website. Due to this reason, we gathered a set of normal feature behaviors on the Windows 7 machine prior to ransomware infection and classified the ransomware-affected Windows 7 features using various classification approaches to determine the attacks.
Previously published research by Takeuchi et al. [13] used the Support Vector Machines (SVM) algorithm to detect ransomware. SVM is a supervised machine learning algorithm, and the approach aims to train the SVM to recognize the API calls as ransomware detection features. The researchers conducted a testbed study by analyzing 276 types of ransomwares, including WannaCry, PETYA, and CryptoLocker in the Cuckoo Sandbox. The SVM detection rate has been shown to be higher than the findings by Rieck et al. [14], with ransomware detection accuracy results of 97.48 percent and a missing rate of 1.64 percent, which is lower than the reported missing rate Rieck et al. [14]. For the unknown ransomware to be properly recognized, the SVM approach implemented by Takeuchi et al. [13] thoroughly examines the API call sequences, and the authors' vector representations include the number of q-grams in the execution logs. As a result, ransomware is less likely to go undetected. As proposed by Takeuchi et al. [13], dynamic analysis for ransomware detection tends to produce high accuracy results as it is more difficult for the malware to hide its behavior than to hide its underlying code.
A ransomware attack can also be detected using the honeypot approach. The honeypot strategy entails network administrators establishing phony computer resources that can be used as decoy machines to detect suspicious behavior [15]. The honeypot approach allows for real-time monitoring of ransomware activity and attack methods. However, such an approach could backfire if it is used as a launchpad to infect other parts of the system. A honeypot approach is employed by Pavithra and Selvakumara Samy [16], with a bogus folder created to funnel the attack and monitor the changes in real-time. Other researchers have developed tools and applications to detect ransomware attacks based on this concept. Paybreak, a specific mechanism for storing cryptographic encryption keys in a key vault, is proposed by Kolodenker et al. [17]. It is then used to decrypt the files and folders encrypted by ransomware. SH-VARR is a framework developed by Al-Dwairi et al. [18], which utilizes the link concept to protect any XML document from being encrypted or deleted during a ransomware attack.
Identifying different types of ransomwares is a difficult and time-consuming task. It is becoming more difficult to establish a solid overcoming method to deal with ransomware attacks since ransomware creators are constantly upgrading their products to avoid any new detection methods. To overcome this, researchers employed Software-Defined Networking (SDN) for ransomware detection using deep packet tracing with POST and GET requests [19], [20]. Once the ransomware is detected, the IP addresses of the servers will be blacklisted by the determined servers in charge of controlling the addresses. Alas, this countermeasure method has a high false-positive rate of nearly 4.95 percent, which makes it a cause of faulty and incorrectly constrained useful services [20]. As a result, the network administrator must check the victim's computer's network traffic and the server for unlawful contact and prohibit the encryption key transfer.
A classification model can be implemented to analyze a victim's computer traffic and to identify the encryption key retrieval process. As such, EldeRAN is proposed by Sgandurra et al. [21], which delivers a highly accurate ransomware dynamic analysis using a machine learning algorithm. The EldeRAN is designed to identify ransomware infestations with a significantly higher true-positive and a low false-positive rate. The results determined that the API calls and the registry key are important in determining the most relevant classification features. EldeRAN can also identify unknown ransomware, with an average error rate of 2.4 percent.  This research presents an in-depth analysis of the ransomware attack detection using dynamic analysis and applied various machine learning classifiers to classify the ransomware attack: Random Forest, Naïve Bayes, J48, Decision Table,

A. Dataset Collection
The TON IoT Datasets [22] from the University of New South Wales' (UNSW) were used to capture ransomware attack features on Windows 7. A total of 132 ransomware attack features that targeted Windows 7 machine have been taken from the database. This data was acquired primarily through a realistic and large-scale network built at the Australian Defense Force Academy's Cyber Range and IoT Labs (ADFA), Canberra's School of Engineering and Information Technology (SEIT), UNSW. The dataset is crucial for ensuring that the classification results are accurate. On the other hand, the dataset assists the researchers in better understanding ransomware features and explaining the behavior of ransomware outbreaks on the Windows 7 operating system. The researcher then investigates the dataset further, with the results used to classify potential ransomware attacks. The testbed was configured with numerous virtual Windows 7 machines and a single attacker host to carry out the ransomware attack, as illustrated in Fig. 3.
The Performance Monitor Tool was used to collect the data that details the attack. The raw data was collected in a *.blg file containing data for desk, process, processor, memory, and network activities. The status of normal dataset features before the ransomware attack has been labelled "Normal," while the successfully infected machine has been labelled "Ransomware."

B. Machine Learning Approach for the Ransomware Classification
Waikato Environment for Knowledge Analysis (WEKA) tool is used to implement feature selection and machine learning classification algorithms in this research. It is a wellknown Java-based machine learning software created at New Zealand's Waikato University [23], [24]. WEKA can implement a wide range of data mining tasks, including data preprocessing, clustering, classification, regression, visualization, and feature selection [25]- [27].
The Windows 7 features listed in Table 1 are categorized based on key attributes and used to train the classification model. The collected features are optimized using feature optimization to ensure accurate ransomware attack classification [28]. This approach streamlines the ransomware classification process by reducing training and testing time during the classification. Irrelevant and redundant traits from the dataset that do not add to the classification model's accuracy are determined and eliminated using the feature selection method [29]. The total number of selected features is reduced from 132 to 77. The reduction is based on the fact that certain features are not significantly changed after the ransomware attack.
The dataset with the selected features will be used to train the ransomware classification model and to identify the ransomware attack during the testing phase. Thus, the dataset is divided into two portions at a ratio of 70:30.

III. RESULT AND DISCUSSION
The supervised machine learning approach was used in this research as it provides a promising outcome by reducing errors [30], [31]. This research compares the performance of various notable classifiers such as Random Forest, Naive Bayes, J48, Decision Table, and Hoeffding Trees using five different classifiers. The accuracy percentage, false positive rate (FPR), precision, recall, and the F-Measure metrics are utilized to evaluate the research. Table 2 lists all five classifiers' results in the testing phase. Compared to the other classifiers, Random Forest and J48 hold the highest accuracy of 99.74 percent. Decision Table, on the other hand, had the lowest accuracy percentage of 98.70 percent. The results indicate that the Random Forest and J48 classifiers accurately identify ransomware outbreaks. The high accuracy percentage in both Random Forest and J48 classifiers is backed up by a high precision percentage of 97.70 percent. Furthermore, the high accuracy percentage obtained in the result may indicate that the feature selection strategy used in this research had an important part in producing superior results.

A. Confusion Matrix
The confusion matrix is used to summarize the performance of the classification model. Table 3 presents the results of two types of classification: normal and ransomware. For all classifiers, the result will be labeled as "ransomware" when the classification model anticipates the presence of ransomware activity and vice versa.
J48 and Random Forest outperformed other classifiers in ransomware detection when all 82 actual ransomwares were classified as "ransomware". In terms of false predictions, both J48 and Random Forest classifiers hold the lowest number with only 1 case. Therefore, the J48 and Random Forest classifiers are more accurate in detecting ransomware attacks than the other classifiers.

B. Receiver Operating Characteristics (ROC) Curve
In addition to the performance matrix, the ROC curve for each machine learning classifier was calculated in this research. True Positive Rate (TPR) was selected as the detection rate that accurately classified the ransomware attack, whereas FPR was chosen as the detection rate that incorrectly classified the normal cases as a ransomware attack. The ROC curve is shown in Fig. 4. The horizontal axis of Fig. 4 depicts the error detection rate, whereas the vertical axis depicts the detection rate. The four lines represent the ROC curves of the machine learning classifier. The ROC curves are difficult to compare under the same conditions. As a result, the bottom portion of the curve was used to calculate the recognition accuracy (AUC). The AUC results can be used to establish if the classifiers are effective in detecting ransomware or not. There are two color ranges in the class. The perfect classification is represented by a range of 0.5 to 1, whereas a range of 0 to 0.5 represents an inadequate classification. According to the AUC results in Table 4, Random Forest and J48 classifiers possess the highest AUC values of 0.997. The outcome implies that both classifiers performed admirably. The ROC curve and AUC values given in this research demonstrated that all classifiers produced compelling and accurate results in detecting ransomware outbreaks. Additionally, Table 5 compares the time taken to train the classification model. The result shows that the Naïve Bayes classifier holds the fastest training time. Random Forest and J48 came in second and third, respectively, with the Hoeffding Tree trailing by a small margin. The result also suggests that the Decision Tree takes the longest time to train the classification model.

IV. CONCLUSION
This research examined the performance of five machine learning classifiers in detecting ransomware attacks: Random Forest, Naïve Bayes, J48, Decision Table, and Hoeffding Trees. A dynamic analysis approach that implements machine learning classifiers is used in this research, and the ransomware attacks were accurately classified according to the selected features. The research analyses the sample dataset and the Windows 7 engine to monitor changes in the features before and after the ransomware attack. The experiment's findings demonstrated that the Random Forest and J48 classifiers had the highest percentage of accuracy in detecting a ransomware attack.
Based on the findings, the following conclusions can be drawn: The results show that the Random Forest and J48 classifier with dynamic analysis can detect ransomware attacks with remarkable performance. The findings highlight that the Random Forest tree size of 77 has produced a high accuracy of 97.74 percent, a high ROC of 99.7 percent, and a low FPR of 0.26 with just 0.07 seconds of dataset training time. The reduction of classification features from 78 to 191 positively improved the ransomware detection rate. The Random Forest classifier performed similarly to J48 in all tests, resulting in the most effective solution to detect ransomware attacks.