ON INFORMATICS

— One sign of how successfully the educational process is carried out on campus in a university is the timely graduation of students. This study compares the Analytic Hierarchy Clustering (AHC) approach with the K-Medoids method, a data mining technique for categorizing student data based on school origin, region of origin, average math score, TOEFL, GPA, and length study. This study was carried out at University X, which contains a variety of architectural styles. The R department, the S department, the T department, and the U department make up one of them. K-Medoids and AHC techniques Utilize the number of clusters 2, 3, and 4 and the silhouette coefficient approach. The evaluation's findings indicate a value. Although there is a linear silhouette between the AHC and K-Medoids methods, the AHC approach (departments R: 0.88, S: 0.87, T: 0.88, and U: 0.88) has a more excellent Silhouette value than K-Medoids (department R: 0.35, department S: 0.65 number of cluster 2, department T: 0.67 number of cluster 2 and program Study U: 0,52). The results of the second approach, which includes the K-Medoids and AHC procedures, are determined by the data distribution to be clustered rather than by the quantity of data or clusters. Based on this methodology, University X can refer to the grouping outcomes for the four departments with two achievements to receive results on schedule.


I. INTRODUCTION
The main metric used in universities to assess academic achievement is GPA [1]- [3]. The average grade received in all courses taken from the first semester to the last is known as the GPA. So, each semester's GPA will be determined. The GPA will be determined between the first and last semesters, though [4], [5]. Each student's GPA impacts how successful or successful they become at a university. Graduation rates or length of study are further indicators of a university's success besides GPA. University X has various faculties, and faculty Y is one of them. The R department, S department, T department, and U department are the four departments that make up Faculty Y. There are many students at Faculty Y. All departments in Faculty Y see a growth in the number of new students each year. This impacts a university's performance or success since there is an imbalance between new students and students who are graduating, which leads to poor evaluations. The ratio of instructors to students is still relevant because some factors, including the imbalance in the number of lecturers and students, can contribute to this imbalance. Another concern is the sheer volume of pupils, which will restrict how much time can be spent in places like labs. Analyzing student achievement is essential to figuring out how effective the current educational system [6]. Student graduation data, such as period of study, TOEFL, and GPA, with new student data, such as report score, school origin, and district origin, have never been combined together before now. The Faculty of Y will consider this, especially when choosing new students, by grouping the data from each department.
Clustering is a technique in data mining. Clustering is a method for organizing data that can be utilized [7]- [9]. Based on the values of their attributes, clustering is a technique for identifying homogeneous object groups. Clustering can assist in analyzing the variables influencing student learning results [10], [11]. Data in the form of text or numeric data can be grouped using clustering. Clustering has been employed in texts before [12], [13]. The Analytical Hierarchy Clustering JOIV : Int. J. Inform. Visualization, 7(2) -June 2023 446-454 (AHC) approach and the K-Medoids method are two examples of the numerous techniques that can be applied. The accuracy of the AHP approach in this study's relationship management system research on electronic clients is 66.6%, and it performs better in terms of time complexity [14]. There has been an investigation into the center point using the K-Medoids technique [15]- [17]. In order to group data, this study contrasts the K-Medoids method with the Analytical Hierarchy Clustering (AHC) method. The data used in this study are the school's name, region of origin, math score, and GPA. The Silhouette Coefficient approach was used to conduct the test.

II. MATERIAL AND METHOD
This study aims to examine the classification of student data using the K-Medoids method and the analytical Hierarchy Clustering (AHC) approach. The use of clustering approaches to analyze student academic performance has been studied in a number of publications over the past few years [6]. K-Means clustering is used to analyze student learning outcomes and performance [18]- [20] . K-Medoids and Analytical Hierarchy Clustering (AHC) are the methods applied in this study. Partition grouping is done using the K-Medoids approach, which is popular due to its effectiveness, simplicity, and convenience of usage [21]- [24]. The AHC approach, in contrast, uses hierarchical clustering to generate grouping of the individual data points inside a cluster in the shape of a tree [6]. These two grouping techniques, however, are appropriate for data having categorical data types [21]. As a result, the K-Medoids approach and the Analytical Hierarchy Clustering (AHC) method were used in this study.

A. K-Medoids
Data will be taken randomly to be used as central data in the cluster, each data has the opportunity to become central data, but most middle data is used as central data in a cluster based on the conditions of the K-Medoids Algorithm [25]. The steps of the K-Medoids Algorithm are as follows:  Initialization of cluster centers as much as k (number of clusters).  Group each data into the closest cluster using the Euclidean Distance approach to calculate the distance between data with the equation (1): Explanation: = the first i data. = the second i data. n = amount of data  Then select the data randomly in each cluster used as a candidate for a new medoid.  After that, calculate the distance of each data in each cluster with the new medoid candidate.  Then calculate the total deviation (S) by calculating the new total distance value -the old total distance. If S < 0, replace objects with cluster data to form a new set of k objects as medoids.  Repeat steps 3 to 5 until there is no medoid change so that clusters and their respective cluster members are obtained.

B. Analytical Hierarchy Clustering (AHC)
This grouping is a Hierarchical Grouping which allows having two main approaches, namely the hierarchical approach and the split approach [21], [26], [27]. AHC Algorithm Steps:  The distance between data is calculated using the Euclidean formula at this stage. Euclidean Distance Formula (6): where: = U value on training data = value of V on test data  Based on the distance matrix, then the data is grouped using Agglomerative Hierarchical Clustering (AHC) using the single linkage method in equation (7). = , ϵ D (7) where: = the distance between the nearest/smallest neighbor of the data group D= Euclidean distance proximity matrix

C. Silhouette Coefficient
This method will calculate the level of proximity between data or objects in a cluster. For example, the steps in the silhouette coefficient process [16], [28], [29] are as follows :  Calculate the average distance from a document, i with all other documents in one cluster [17].
where # = The mean difference of object (i) to all other objects on A , ' = Distance between data i to j ) = cluster  Calculate the average distance from document i to all documents in other clusters, and take the smallest value [17]. , * = $% ∑ '+ * , ' , = * ≠ A , * (10) where , * = the average distance of document, i with all objects in another cluster C * = Other clusters other than cluster A or C are not the same as cluster A. , = The average distance of the object with all other objects that are different in the other clusters. Data on the number of incoming students and graduate students in a Y faculty at an X university are listed in Table 1. This investigation used the Python programming language to carry out the grouping process.

A. Calculation Process
This study uses student data from faculty Y's R, S, T, and U departments, with a total of 90, 87, 76, and 30 students. The information is comprised of student records from the classes of 2014 and 2015, and it includes a number of characteristics, including college class, ID, name, department, entrance path, school, name of the school, district of origin, mathematics score, length of study, GPA, and TOEFL. The data load stage precedes the data processing stage. As illustrated in Table 2, the S department dataset, for instance, is loaded first before processing. The next step is to tidy up the data, although the ID and School Name characteristics have already been saved. The One Hot Encoding method is then applied to the School attribute during data transformation. The data is translated into three categories for the district origin attribute, with origin one, grey, comprising North Maluku and Central Kalimantan. According to Figure 1, origin two, which is brown, is made up of Java Island, Sumatra Island, Sulawesi Island, West Kalimantan, South Kalimantan, Bali, NTT, and NTB; origin three, which is green, is made up of Papua, East Kalimantan, and Maluku. Based on a set of criteria, this classification is based on the caliber of the education received.
In Table 3, the categories are listed. As a consequence of the transformation findings based on mapping the quality of education in Indonesia, the school origin attribute is then changed into three new attributes, namely region I, region II, and region III, and placed into the dataset along with the values. Enter the K-Medoids Algorithm step after receiving the processing results, where each data is measured in relation to other data using the Euclidean Distance method, as indicated in Table 4. The K-Medoids technique is used to process the data from the Euclidean Distance computation, and the result is some clusters with members that are similar to the medoid or the central data. The single linkage approach is used in the equation to perform calculations for the AHC Process from the Euclidean Table (3). Table 5 displays the results of the single link calculation from the first to the ninth student with the deletion of the rows and columns of the matrix in groups of students five and student ten and the addition of rows and columns for the group (school student 5, student school 10). The next step is to choose the smallest distance from the group to calculate the distance between the fifth and tenth students and the remaining groups. To achieve the results of clusters or grouping utilizing the K-Medoids and AHC methods, the single linkage step is carried out until there is only one cluster or grouping, as shown in Table 7.

B. Clustering Accuracy Test
The silhouette coefficient [30]- [32] is used in a test to find data groupings that resemble each other as closely as feasible. Data from 4 departments are used to conduct the test. Three trials are conducted for each department, using cluster sizes of 2, 3, and 4. The silhouette coefficient approach is used to conduct this test. Figures 2 through Figure 5 display the test findings. On the graphs of the two tests, a value that is exactly proportional to the test results using the K-Medoids approach and the AHC method can be seen. Figure 3 contrasts the accuracy of clusters 3 and 4, nevertheless. The accuracy of the AHC approach is rising, but the accuracy of the K-Medoids method is falling. The data distribution in the S Department accounts for the accuracy discrepancy. The AHC approach is superior for grouping student data, as shown by the test in Figures 2 to 5. The best outcomes for the R department are displayed in Table 8.

C. Grouping Results
All of the data in Table 8's grouping, which employs the AHC findings, have accuracy values above 0.8, suggesting that the grouping's resulting structure is substantial [33]. In order to graduate on time and with a GPA over 3, students from region 2, namely Java and Kalimantan, who have a math score of 80 or more and the name of a high school, can be referred to the R department. The distribution of the grouped data in the R department is shown in Table 9, and the graph of the results is shown in Figure 5.
Regarding the S department, the math score has no impact on timely graduation and GPA rankings because students can still graduate on schedule and achieve GPAs greater than 3. The favored origins of the students are SENIOR HIGH SCHOOL and Java. Figure 6 depicts the graph of the outcomes of grouping the S department data, and Table 10 presents the distribution of the grouped data in the S department.
To graduate on time and with a GPA above three in the T department, a student must have a math score of at least 75. A nearby Javanese high school inspired the name of the school. The distribution of the grouped data for department T is shown in Table 11, and the dendrogram of those results is shown in Figure 7.
If a student has a math score of at least 80, is from the Java region, and graduated from high school with a GPA of at least three, they may be referred for the U department. Additionally, Sumatran students can graduate on time with a GPA in the top three using a math score of 75. The distribution of grouped data in the U department is shown in Table 12, and Figure 8 displays the graph of the data from the U department after being grouped.    K-Medoids performed more accurately when categorizing student data using the AHC approach. However, the outcomes of the experiments conducted with both AHC and K-Medoids are directly proportionate. Only 1 of the 4 test data had not directly proportional results. According to the AHC test results, all research data has an accuracy value greater than 0.8. The grouping that results from this has a solid structure. Additionally, it can be advised that Faculty Y at University X pick new students for the R, T, and U departments with the name of their high school and origin from the Java region using a mathematics score of at least 80 based on the results of the overall grouping. A minimum math score of 76 is required for S majors to choose a new high school and Java students.