Inversed Control Parameter in Whale Optimization Algorithm and Grey Wolf Optimizer for Wrapper-Based Feature Selection: A Comparative Study

— Whale Optimization Algorithm (WOA) and Grey Wolf Optimizer (GWO) are well-perform metaheuristic algorithms used by various researchers in solving feature selection problems. Yet, the slow convergence speed issue in the Whale Optimization Algorithm and Grey Wolf Optimizer could demote the performance of feature selection and classification accuracy. Therefore, to overcome this issue, a modified WOA (mWOA) and modified GWO (mGWO) for wrapper-based feature selection were proposed in this study. The proposed mWOA and mGWO were given a new inversed control parameter expected to enable more search areas for the search agents in the early phase of the algorithms, resulting in a faster convergence speed. This comparative study aims to investigate and compare the effectiveness of the inversed control parameter in the proposed methods against the original algorithms in terms of the number of selected features and the classification accuracy. The proposed methods were implemented in MATLAB where 12 datasets with different dimensionality from the UCI repository were used. kNN was chosen as the classifier to evaluate the classification accuracy of the selected features. Based on the experimental results, mGWO did not show significant improvements in feature reduction and maintained similar accuracy as the original GWO. On the contrary, mWOA outperformed the original WOA regarding the two criteria mentioned, even on high-dimensional datasets. Evaluating the execution time of the proposed methods, utilizing different classifiers, and hybridizing proposed methods with other metaheuristic algorithms to solve feature selection problems would be future works worth exploring.


I. INTRODUCTION
Feature selection has been a popular topic for researchers over the past decades. It is known as the process of reducing dimensionality by selecting the most relevant features and discarding less relevant features in a dataset [1]. The objective of such dimensionality reduction is due to the ever-growing dataset dimensionality rate throughout technological advancement. A high-dimensional dataset (HDD) is a matrix consisting of many columns or rows representing a huge size of features or instances [2]. Generally, not every feature could provide significant information for classification, and the less relevant features in a dataset could demote classification accuracy. Especially for HDDs, irrelevant features cause memory constraints and result in the expensive cost of training and computing, which is also known as the "curse of dimensionality" [2], [3]. Therefore, feature selection is often related to two main tasks, that is, to promote maximum classification accuracy and to select a minimum number of selected features to avoid the "curse of dimensionality" [4]- [7]. Besides, feature selection can be interpreted as the preprocessing process for data classification, where the class label of an instance is assigned based on the learning model trained [8]. Since the training data is contributed from the features selected, the ability to select fewer features without lowering the classification accuracy is the criteria to evaluate the effectiveness of a feature selection method.
There are three feature selection methods: filter-based, embedded-based, and wrapper-based [9]. The filter-based method carries out feature selection without using any learning algorithms, and it solely depends on the mutual information provided by the features and their relationship with the class label [10]. After the filter-based feature selection process takes place, any classifiers can be utilized to evaluate the quality of selected features in a dataset. Therefore, the filter-based method is flexible to implement and fast to execute. Besides, the embedded-based method performs feature selection during the classification process; hence, it can produce high classification accuracy [9]. Unlike filterbased and embedded-based methods, the wrapper-based method uses a certain classifier to obtain the quality of each feature subset. Thus, it has higher time complexity and higher classification accuracy [9]. Therefore, the wrapper-based method feature selection is the focus of this study.
Over the years, metaheuristic optimization algorithms have been utilized with feature selection. In metaheuristic algorithms, exploration and exploitation phases determine the search mechanism to obtain the optimal solution [11]. The exploration phase is where the algorithm randomly looks for potential areas in the search space, whereas the exploitation phase is where the algorithm further scrutinizes a certain search space area found during exploration. These two phases are further discussed in Section 2. Some of the recent metaheuristic optimization algorithms are Manta Ray Foraging Optimization [12], Harris Hawks Optimization [13], Grey Wolf Optimizer (GWO) [14], and Whale Optimization Algorithm (WOA) [15]. Many researchers have employed these metaheuristic algorithms in various domains, such as solving medical problems [16], electromagnetic problems [17], predicting the rate of traffic congestion [18], and handling feature selection problems [10], [19]- [21].
Notably, WOA and GWO have shown similarities in terms of the parameters and characteristics in their algorithm designs, which will be explained in Section 2. These two algorithms are excellent optimizers and have yielded great performance in both filter-based and wrapper-based feature selection methods due to their strengths in exploration and exploitation [10], [19], [20], [22]- [24]. Al-Tashi et al. implemented a wrapper-based feature selection with GWO to select the most relevant features to be used in diagnosing coronary artery disease with a two-stage approach [22]. The most relevant features in the Cleveland Heart disease dataset were identified in the first stage, while the fitness function was evaluated by the Support Vector Machine (SVM) classifier in the second stage. Based on the experimental results, the proposed GWO method obtained 89.83%, 93%, and 91% for classification accuracy, sensitivity, and specificity, respectively.
Another research by Hu, Pan, and Chu. [23] implemented a wrapper-based feature selection using a binary variant of GWO (BGWO) and another improved BGWO variant (ABGWO). The binary conversion was achieved by using four V-shaped transfer functions so that the continuous values were mapped to binary values, whereas a new control parameter in ABGWO was utilized to improve the convergence speed of BGWO. The classifier used to obtain classification accuracy in the study is the K-nearest neighbor (kNN). Based on the experimental results, with improved convergence speed, the proposed ABGWO outperformed BGWO in both classification accuracy and execution time for most cases.
Mafarja and Mirjalili [19], [20] introduced two wrapperbased feature selection studies with WOA. Both studies have produced outstanding classification accuracy using the kNN classifier. In the first study, WOA was improved by Tournament selection (WOA-T), roulette wheel selection (WOA-R), as well as the crossover and mutation operators (WOA-CM) [19]. The execution time of WOA-CM was the least as compared to other methods. The average feature selection ratio of WOA-CM was smaller than WOA, and the average classification accuracy of WOA-CM was higher than WOA in most datasets [19]. In the second study, WOA was hybridized with Simulated Annealing (WOA-SA), where WOA handled the global search, whereas SA handled the local search [20]. In addition, WOA-SA with the Tournament selection mechanism (WOA-SAT) was also introduced in the same study. The average feature selection ratio of WOA-SAT was lower than WOA, while the average classification accuracy of WOA-SAT was greater than WOA in most datasets [20].
Nematzadeh et al. [10] implemented a filter-based feature selection method with WOA and Mutual Congestion that applied different rates for discarding the feature size starting from 20% to 80% on four medical HDDs. Using the SVM, Naïve Bayes, and Decision Tree classifiers to evaluate the quality of the selected features, the proposed method yielded the highest average classification accuracy when 50% of features from the original HDDs were discarded. The promising results from Nematzadeh et al. [10] have inspired the work of another filter-based feature selection method by Yab, Wahid, and Hamid [24] with modified WOA (mWOA). The study adopted a similar approach of using a 50% feature selection size for dimensionality reduction and introduced a new control parameter formula in mWOA to improve the convergence speed of WOA.
According to the literature, the slow convergence speed in WOA is a known issue, as the problem is also found in GWO [25]. GWO is compared to WOA as they have similar characteristics and parameters [23], [26]. In both WOA and GWO, the control parameter, , determines the trade-off for the exploration and exploitation phases [11], [25], [27]. Since the value of affects the coefficient vector, ⃗ , it contributes to the distance between the search agents and the prey over iterations. As a result, the global and local search mechanisms that decide when to converge to the optimal solution, are affected by . Therefore, to reduce the impact of slow convergence speed, the modification of the control parameter's formula is required [11], [28]. Previous studies reported that ABGWO for wrapper-based feature selection [23] and mWOA for filter-based feature selection [24] have both improved the convergence speed. However, the performance of filter-based mWOA has only been tested against filter-based WOA and without feature selection (NO FS).
Therefore, this study presents a comparative study of (i) a proposed mWOA as a wrapper-based feature selection and (ii) a modified GWO (mGWO) by inversing its control parameter formula as in mWOA. This comparative study aims to evaluate the effectiveness of the modified control parameter's formula for both mWOA and mGWO against the original GWO and WOA. The proposed wrapper-based mWOA and mGWO were implemented in MATLAB, and the kNN classifier was used to evaluate the significance of the selected features toward the classification accuracy.
The rest of the paper is organized as follows. Section 2 presents the materials and methods used in this work, which covers the inspiration and mathematical equations for original WOA and GWO, and introduces the proposed method involving the inversed control parameter for mWOA and mGWO. The experimental results are tabulated and thoroughly discussed in Section 3. Lastly, Section 4 concludes the findings of this comparative study and provides insights into the potential future work.

II. MATERIALS AND METHOD
This section presents the materials used in this study: Whale Optimization Algorithm and Grey Wolf Optimizer. The proposed method and experimental setups used in realizing the comparative study are also included in this section.

A. Whale Optimization Algorithm
Whale Optimization Algorithm (WOA) is a nature-inspired metaheuristic algorithm introduced by Mirjalili and Lewis [15] in 2016 for solving optimization problems. WOA has shown its spectacular performance in various research areas such as engineering, transportation, and medical diagnosis [16], [18], [19], [29]. The reason behind WOA's excellent performance is because of its searching mechanism, where it imitates the bubble-net feeding method of humpback whales, as shown in Fig. 1. Fig. 1 The bubble-net feeding method of humpback whales [15] 1) Encircling prey: In this phase, the whales encircle the prey. Since the optimal solution of the prey is not yet identified, the current best position is assumed to be the prey. Once the best position is known, the whales update their positions toward the current best position over iterations. The scenario is represented by Eq. (1) and Eq. (2), where denotes the current iteration, ⃗ is the position vector, * ⃗ represents the position vector of the best position obtained at the moment, while | | and • indicate the absolute value and dot product operator, respectively. The value of * ⃗ would be updated if a better position is found at each iteration. The distance between the current whale and the best position is denoted by ⃗ . The coefficient vectors ⃗ and ⃗ are shown in Eq. (3) and Eq. (4) where ⃗ is a random vector between 0 and 1, while ⃗ is a control parameter that decreases linearly from 2 to 0 using Eq. (5), where indicates the maximum iteration.
2) Bubble-net attacking (exploitation): In this phase, the whales perform bubble-net attacking by two maneuvers: shrinking the circle and moving along the spiral-shaped route. The first maneuver is achieved by Eq. (2) while the second maneuver is achieved by Eq. (6) where indicates a constant to create the shape of the logarithmic spiral, denotes a random number between -1 and 1, and ⃗ is computed from ⃗ | * ⃗ ⃗ |, which means the distance between the i th whale to the prey. Since the whales need to carry out these two maneuvers at the same time, hence, is used as a probability to decide which maneuver to perform. This scenario is presented in Eq. (7). If 0.5, the whales shrink the circle, otherwise they move along the spiral-shaped route.
3) Searching for prey (exploration): In this phase, whales look for the potential position randomly based on their positions among one another. It is unlike the exploitation phase because the position of a whale is updated using a randomly selected whale, instead of using the best position discovered. This scenario is represented in Eq. (8) and Eq. (9), where +,-. ⃗ indicates a random whale's position selected from the population.
The pseudocode of WOA is shown in Fig. 2

B. Grey Wolf Optimizer
Grey Wolf Optimizer (GWO) is a nature-inspired metaheuristic algorithm introduced by Mirjalili et al. in 2014 for solving optimization issues [14]. It has shown great performance in electromagnetics [17] and feature selection [22], [23]. GWO is well-perform due to the grey wolves' hunting mechanism that involves a special social hierarchy.

1) Social hierarchy:
The algorithm of GWO is inspired by the grey wolves' social hierarchy in group hunting. The hierarchy consists of four levels which are alpha (α), beta (β), delta (δ), and omega (ω) from top to bottom as illustrated in Fig. 3. Fig. 3 Grey wolves' social hierarchy [14] The alpha wolves are the highest in the social hierarchy, and they lead the other three levels below them. Being the second highest in the hierarchy, beta wolves obey alpha wolves and dominate delta and omega wolves. Similarly, delta wolves are dominated by the first two levels of wolves and dominate the bottom level of wolves. The last in the hierarchy is called the omega wolves, and the top three levels of wolves dominate them. The social hierarchy concept is transformed into the algorithm of GWO where alpha, beta, and delta are categorized as the fittest, second fittest, and third fittest solutions, respectively. The solutions other than the top three fittest ones are called omega.
2) Encircling prey: In this phase, the wolves encircle the prey (best position). The scenario is represented by Eq. (10) and Eq. (11) where represents current iteration, ⃗ denotes the position vector of a grey wolf, ⃗ / represents the position vector of the fittest solution obtained so far, and • is the dot product operator. ⃗ / will be updated in each iteration once better position is found. Parameter ⃗ denotes the distance between the current wolf and the as-is best position. The coefficient vectors ⃗ and ⃗ are presented in Eq. (12) and Eq. (13), respectively, where ⃗ 0 and ⃗ 1 are random numbers between 0 and 1, whereas ⃗ is a control parameter that decreases linearly from 2 to 0. The decrease in ⃗ is done using Eq. (14), where indicates the maximum iteration, similar to WOA.
3) Hunting: The hunting mechanism of GWO depends on alpha, beta, and delta wolves, as shown in Eq. (15) and Eq. (16). It is worth mentioning that the top three fittest solutions are assumed to be the best position since the optimal solution is unknown. Hence, GWO uses the mean position from alpha, beta, and delta wolves to update each wolf's position based on Eq. (17).

4) Attacking prey (exploitation):
The process of attacking prey is also known as the local search, where grey wolves hunt the prey as it stops moving. The value of ⃗ is affected by ⃗ whereby ⃗ falls randomly between the range of 2 to 2 based on Eq. (12). If 3 ⃗ 3 1, the value of ⃗ is between -1 to 1, thus, the wolf attacks the prey because the wolf's next position would be between its current position and the prey's position.

5) Searching for prey (exploration):
The search process is also known as the global search, where all the top three levels of wolves' position are used to look for the best solution. The global search is the opposite of the local search. Each wolf looks for prey separately and then converges toward the best solution found. If 3 ⃗ 3 9 1, the value of ⃗ is smaller than -1 or larger than 1, thus, the wolf stays away from the current prey position to find a better solution. Therefore, this vector manages the balance of exploration and exploitation. Besides ⃗ , the ⃗ vector also determines the exploration. The value of ⃗ is a random number between 0 to 2, and it randomly weights ( 9 1) and unweights ( 1) the effect of prey by the distance. This concept allows the algorithm to exhibit random behavior during optimization, encouraging exploration and eradicating local optima. The pseudocode of GWO is shown in Fig. 4

C. Proposed Inversed Control Parameter
Based on the literature, mWOA was designed for filterbased feature selection to improve the convergence speed while performing a 50% of feature reduction on four medical HDDs [24]. However, its performance has not yet been tested in a wrapper-based feature selection method nor evaluated on datasets of different dimensionality. Hence, this study proposed two wrapper-based feature selection methods using modified WOA (mWOA) and modified GWO (mGWO) to improve the convergence speed by inversing the control parameter.
In the original WOA and GWO, the control parameter is defined by Eq. (5)   Meanwhile, the decreasing trend of decreases the distance and increases the possibility of changing positions, which covers less search space at the initial phase of the iterations, resulting in a slower convergence speed [23]. Hence, the situation had to be changed the other way around to enable faster convergence speed [24]. Therefore, the linearly decreasing control parameter in WOA and GWO was inversed to be linearly increasing with Eq. (18) for the proposed mWOA and mGWO. With the new control parameter formula, the value of linearly increases from 0 to 2, as depicted in Fig. 6. 2 2  Table 1 shows the comparison of the original and proposed methods. By using the inversed control parameter from Eq. (18), the increasing trend of increases the distance and decreases the possibility of changing positions, which covers more search space at the initial phase of the iterations, a faster convergence speed in the proposed mWOA and mGWO.
The inversed control parameter in the proposed mWOA and mGWO is expected to improve the convergence speed of WOA and GWO in wrapper-based feature selection. With better convergence speed reflecting the algorithm performance, the search agents are most likely to excel in selecting only the most important features in a dataset that truly contribute to the classification accuracy. Therefore, in the following subsection, the effectiveness of the inversed control parameter for both mWOA and mGWO against their original forms of GWO and WOA, was tested in experiments.

D. Experimental Setup
The experiments were implemented using MATLAB version R2017b. A total of 12 benchmark datasets from UCI [30] with different dimensionality were used. The datasets are sorted in descending order based on dimensionality (instance × feature), as shown in Table 2. There are four HDDs with more than 2,000 features in the datasets, namely, SMK_CAN_187, GLI_85, CNS, and Colon, where the rest of the datasets were non-HDDs.  A wrapper-based feature selection approach with the kNN classifier was used, where the k was fixed to 5. Crossvalidation was implemented in this study, where each dataset was randomly partitioned for hold-out in N validation, where N denotes the number of instances in a dataset. The partition randomly divided N instances into two subsets: 20% for testing sets and 80% for training sets. The same setting was used for each experiment to avoid bias in results. The parameters for population maximum iteration were 10 and 100, respectively. The machine that was used to run the experiments was Intel Core i7-10750H CPU @ 2.60GHz with 32GB RAM. The experiments were conducted for 10 runs with two pairs of algorithms, which include WOA and mWOA, as well as GWO and mGWO. In the next section, both the proposed mWOA and mGWO were compared against their original forms with the following criteria:  The number of selected features in best, average, and worst cases: the best number of selected features was obtained by the minimum value in 10 runs, while the maximum value was obtained by the worst number of selected features in 10 runs. The average number of selected features was computed by the mean of selected features in all ten runs. The feature reduction rate can also be computed using Eq.
 The classification accuracy in best, average, and worst cases was obtained using the selected features on the test dataset. The maximum value selected the best accuracy in 10 runs, while the worst accuracy was selected by the minimum value in 10 runs. The average accuracy was computed by the mean of accuracies in all ten runs.

III. RESULTS AND DISCUSSION
This section presents the experimental results of the proposed mWOA and mGWO against their original forms and discusses the effectiveness of the inversed control parameter based on the number of selected features and classification accuracy.

A. The number of selected features
The wrapper-based feature selection experiments utilized 12 datasets in Table 2 for 10 runs, and the number of selected features vary across each test case. Table 3 presents the number of selected features by WOA and mWOA in their best, average, and worst cases. Based on the experimental results, mWOA performed better than WOA in 6, 6, and 8 out of 12 datasets in the best, average, and worst test cases, respectively. Also, it achieved the same performance as WOA in 5, 3, and 2 out of 12 datasets in the best, average, and worst test cases, respectively. It indicates that mWOA could select fewer features in at least 50-91.67% of datasets in the best case, 50-75% of datasets in average cases, and 66.67-83.33% of datasets in the worst case. Even in the worst case, mWOA showed its strength in selecting fewer features than WOA in both HDDs and non-HDDs. In the GLI_85 dataset, mWOA successfully selected 2,868 features resulting in feature reduction rates of 87.13% as compared to WOA with 60.41%. The improvement that mWOA achieved in this HDD is 26.72%. As for non-HDDs, mWOA managed to reduce 43.75% of features as compared to WOA with 25% of feature reduction rate, indicating that mWOA was 18.75% better than WOA. Table 4 shows the number of selected features by GWO and mGWO in their best, average, and worst cases, respectively. The experimental results show that the mGWO outperformed GWO in 2 datasets in the worst case and achieved the same performance as GWO in 6 datasets in the best case, 3 in average cases, and 1 in the worst case. Overall, the performance of mGWO was not significant as compared to GWO.

Note: Bold value with an asterisk (*) indicates that the proposed mGWO outperformed GWO, while a hash sign (#) indicates they achieved the same performance.
As shown by the best case results in Table 4, mGWO did not outrun GWO in all four HDDs. However, mGWO was still able to achieve high feature reduction rates on HDDs where they were reduced by at least 80% of features. For example, SMK_CAN_187 was reduced to 3,808 from 19,993 features, GLI_85 to 3,077 from 22,283 features, and CNS to 1,393 from 7,129 features, and the selected features for Colon was 234 out of 2,000 features. As for the non-HDDs, mGWO remained the same number of selected features as GWO in most datasets. Nonetheless, mGWO and GWO both reduced a maximum of 93.33% and 90.91% of features in Wdbc and Parkinson's datasets, respectively, indicating that both GWO and mGWO were capable of dealing with HDDs and non-HDDs.
While comparing the performance of mGWO and GWO in selecting fewer features in the average cases, mGWO did not show its strength in HDDs. As for non-HDDs, mGWO mostly showed values that remained the same as GWO. It can be seen that mGWO's best performance was in non-HDDs, especially the Parkinson's dataset, where it was reduced to 86.36% by selecting only 3 from 22 features. However, GWO selected 1 feature fewer than mGWO in the same dataset.
Further evaluating the worst case shown in Table 4, mGWO was weaker than GWO. There were only two datasets where mGWO outperformed GWO, which include an HDD and a non-HDD, namely, GLI_85 and Zoo. In the GLI_85 dataset, mGWO selected 4,553 features with a 79.57% of feature reduction rate while GWO only reduced 78.88%, suggesting a slight improvement of 0.69% made by mGWO. The improvement of mGWO in the non-HDD dataset, Zoo was much higher than GWO, where mGWO achieved a higher ability to select 6.25% fewer features than GWO.
To summarize the feature selection performance, by taking the mean from all datasets, the proposed mWOA showed greater performance in selecting 1.67%, 2.05%, and 4.45% fewer features than WOA in its worst, average, and best cases. Also, mWOA improved the feature reduction rate by 22.73% better than WOA on SPECT dataset in the best case, 13.33% better than WOA on Wdbc dataset in the average case, and 26.72% better than WOA on GLI_85 HDD even in its worst case. Besides, the proposed mGWO's feature selection performance was less significant because it was merely maintaining similar results as the original GWO without major improvement. The following subsection presents the experimental results of the proposed mWOA and mGWO against WOA and GWO based on classification accuracy.

B. Classification accuracy
The classification accuracy obtained by the selected features was evaluated using the kNN classifier. The best, average, and worst results obtained from mWOA against WOA were tabulated in Table 5. Based on the results, mWOA outperformed WOA in 2, 7, and 4 out of 12 datasets in the best, average, and worst test cases, respectively. mWOA also performed equally as WOA in 9, 1, and 5 out of 12 datasets. It suggested that mWOA was able to produce higher accuracy than WOA in at least 16.67-91.67% of datasets in the best case, 58.33-66.67% of datasets in average cases, and 33.33-41.67% of datasets in the worst case. It can be observed from the best case that mWOA had similar results as WOA in general. In the four HDDs, mWOA produced comparable results to WOA where GLI_85, CNS, and Colon showed 100% accuracy. However, mWOA was proven to have better performance in non-HDDs as compared to WOA. Specifically, mWOA obtained 90.57% accuracy, 1.89% higher than WOA (88.68%) in the SPECT dataset. Also, the highest performance of mWOA can be found in the Wine dataset where mWOA achieved 100% of accuracy, which is 2.86% higher than WOA.
Moreover, based on the results in the average case, mWOA outperformed WOA in most non-HDDs, including Ionosphere, Yeast, SPECT, Parkinson's, Wine, and Zoo datasets. The fact that mWOA also showed the highest improvement in the Wine dataset by obtaining 1.72% better than WOA, has proven the ability of mWOA to increase the accuracy in non-HDDs. Although mWOA did not surpass WOA in Wdbc and Iris datasets, mWOA still managed to generate high accuracies at 94.96% and 97.67% in the Wdbc and Iris datasets, respectively. Besides, mWOA triumphed over WOA with a significant difference in Colon dataset. mWOA and WOA attained 98.33% and 96.67% of accuracy respectively, indicating that mWOA was able to select relevant features that contributed to 1.66% higher classification accuracy than WOA in HDD.
In addition, even in the worst-case scenario, mWOA was able to display significant improvements across datasets with different dimensionality. For instance, in the CNS dataset, mWOA achieved 83.33% accuracy, equivalent to 8.33% higher than WOA (75%). Not only showing its ability in generating high accuracy in HDD, mWOA also attained an 11.43% improvement in a non-HDD when mWOA and WOA achieved 94.29% and 82.86% of accuracy, respectively in the Wine dataset. Another significant result of mWOA in non-HDD can be found in the Zoo dataset with 90% of classification accuracy, that is 5% better than WOA. Furthermore, Table 6 presents the classification accuracy obtained by GWO and mGWO in best, average, and worst cases. Based on the experimental results, mGWO surpassed GWO on 2 datasets in the best case, 5 in the average case, and 1 in the worst case. Additionally, mGWO maintained the same performance as GWO on 8, 4, and 8 datasets in the best, average, and worst case, respectively. It means that mGWO performed better than GWO in at least 16.67-83.33% of datasets in the best case, 41.67-75% in the average case, and 8.33-75% in the worst case.
It can be seen that the best-case classification accuracies obtained by mGWO and GWO on every dataset in Table 6 were mostly higher than 80% of accuracy. However, for the Yeast dataset, mGWO and GWO only generated 60.47% and 60.81% accuracies, respectively. This might be because the Yeast dataset may possess a complex or uncleaned data structure that somehow degraded the classification process. Besides, mGWO showed better classification accuracy on the rest of the HDDs and non-HDDs in the best case. For instance, mGWO achieved 97.44% of accuracy whereas GWO obtained 94.87% of accuracy, indicating that mGWO was 2.57% better in the Parkinson's dataset. In HDDs, mGWO was able to attain 100% of accuracy in GLI_85, CNS, and Colon datasets, the same as GWO.
As for the average case, the classification accuracy of the four HDDs obtained by mGWO was either weaker or the same as GWO. Nevertheless, mGWO outperformed GWO mostly in non-HDDs. Similar to the best-case results, the highest improvement made by mGWO against GWO in the average case can be found in the Parkinson's dataset, where mGWO achieved 93.08% of accuracy, which was 1.29% higher than GWO. In the worst case, mGWO only maintained the performance as GWO in most datasets. The accuracy attained by mGWO in HDDs was weaker than GWO. There was only 1.02% improvement of mGWO in the Yeast dataset compared to GWO.
In a nutshell, mWOA was able to achieve 0.17%, 0.15%, and 1.21% higher accuracy than WOA in the best, average, and worst cases, by taking the mean from all datasets. In non-HDDs like the Wine dataset, mWOA achieved 2.86% better than WOA in the best case, 1.66% better than WOA on the Colon HDD in the average case, and 8.33% better than WOA on the CNS HDD in the worst case. As for the proposed mGWO, its performance in classification accuracy was mostly maintaining comparable results as the original GWO. However, mGWO did show some improvement in certain non-HDDs with slightly better accuracy than GWO in the average case.The following section concludes the comparative study and suggests future work related to the extension of this work.

IV. CONCLUSION
Previously, the authors proposed a filter-based feature selection method with a modified WOA (mWOA) to improve the convergence speed and achieve better accuracy performance than WOA. However, GWO was also found to be having the same issue of slow convergence speed. Therefore, as an extension of the previous work, the authors began with this comparative study by proposing wrapperbased feature selection methods with mWOA and modified GWO (mGWO) using an inversed control parameter, . The inversed control parameter was expected to allow search agents to cover more search space in the early iterations to improve the convergence speed by obtaining the optimal solution faster. Twelve datasets with different dimensionality from UCI were adopted in the experiments, and the kNN classifier was used to evaluate the selected features. The performance of the proposed methods was evaluated against their original algorithms in terms of the number of selected features and classification accuracy.
Based on the experimental results, it can be concluded that the strength of the proposed mWOA and mGWO was shown differently. Although GWO and WOA share similarities in their algorithms and both have the issue of slow convergence, using the same method is not applicable for both. In terms of feature reduction rate, mWOA obtained 1.67%, 2.05%, and 4.45% better results than WOA in the best, average, and worst cases, respectively. However, mGWO did not outperform GWO in most cases. As for the classification accuracy, mWOA outperformed WOA with 0.17%, 0.15%, and 1.2% better results in the best, average, and worst cases, respectively. Nonetheless, mGWO outperformed 0.08% in classification accuracy than GWO in the best case, yet, no improvements in the average and worst cases. It was proven that the proposed inversed control parameter is only effective in mWOA, but less effective in mGWO, when comparing them against their original algorithms.
In the future, the execution time of feature selection could be considered as one of the criteria to evaluate the proposed methods. Besides, since the inversed control parameter showed more significant improvement on mWOA, thus, hybridizing mWOA with other recent well-perform metaheuristic algorithms to further investigate the performance would be an interesting study. Some recent metaheuristic algorithms that could be hybridized with mWOA include Manta Ray Foraging Optimization and Harris Hawks Optimization. Lastly, other classifiers could be wrapped with mGWO to select relevant features that produce better classification accuracy in HDDs.