The Implementation of the K-Medoid Clustering for Grouping Hearing Loss Function on Excessive Smartphone Use

— During the current pandemic, smartphones have become a means of learning for all students in Indonesia, including high school students. Students use smartphones to send assignments, learn via video calls, and conduct online exams. The prolonged use of smartphones, from the beginning of learning hours in the morning to study hours in the evening, has a terrible impact on the ear health of high school students in Padang. Excessive smartphone use caused a decrease in the student's hearing function. Therefore, this study aims to group the audiometry results of high school students in Padang who have a hearing loss function. The audiogram result is only performed as the result of a frequency test of the subject's hearing in both the left and right ear. Conventionally, an otolaryngologist concluded the final decision of hearing loss ability. This research proposed an automatic classification of audiometry results using machine learning methods. The K-Medoids clustering was selected to classify the audiometry data in this research. Of 210 audiometry data, 91 data is confirmed by an otolaryngologist as valid data. By using the K-Medoids clustering, 93 data is classified into Normal hearing, Mild Hearing loss, and Moderate Hearing loss. The proposed model successfully grouped the audiometry data into three categories. The confusion matrix is applied to measure the model performance, which has 28,3% accuracy, 64,3% precision, and 21,4% recall.


I. INTRODUCTION
Currently, the development of smartphone technology is experiencing very rapid progress.As time passes, the number of smartphone users, ranging from children to parents, is increasing.A smartphone can support the performance of almost all human activities.Start sending messages, calls, chats, games, and more.The existence of this smartphone is considered so vital that it is not uncommon for most people to have a smartphone today.
In the digital era, smartphones have become the primary need of urban people to communicate, find information, and meet other needs.People can unknowingly be attached to their smartphones [1], [2], [3].Concerning users' responses of focus and enjoyment, habit is one of the factors contributing to smartphone addiction.This smartphone addiction has a destructive impact on health [4], [5], one of which impacts the ears.Keeping ears clean and healthy is essential to prevent various ear problems, including ear infections, tinnitus, deafness, and sudden deafness.Ear problems not only affect hearing, but they can also disrupt the balance of the body.Some variances were recognized in the audiograms of employees in the different professional categories, and in almost all categories, we discovered worse hearing in the left ear.Only a few publications in the medical literature have identified that the left ear has worse hearing levels and is considered significant information concerning the asymmetry of occupational noise-induced hearing loss [6].Therefore, for the auditory and vestibular organs to continue to function correctly, the health of the ears must be appropriately maintained.
We are investigating the impact of smartphone use on high school students, but data processing is still manual.This could be a challenge if the stored data is quite significant in number, which requires a system to categorize the data for the students who have been impacted by smartphone use.When determining hearing loss, there was a discrepancy in the frequency and stage of threshold shift employed and the suggested course of action once hearing loss was identified [7]. Figure 1 shows six categories of audiogram results [8]: Normal Hearing Ability, Mild Hearing Loss, Moderate Hearing Loss, Moderately Severe Hearing Loss, Severe Hearing Loss, and Profound Hearing Loss.During the Covid-19 pandemic, there are connections between daily verified Covid-19 cases and smartphone use.The compelling number is using smartphones' memory, Wi-Fi, and network switches.The most robust connection is between the smartphone use pattern and the COVID-19 case, subsequently disclosing the state of an outbreak [9], [10].Recognizing the advantages of using hearing aids in quiet environments can be helpful for people with fewer communication or listening demands to decide on a hearing aid [11].Many attempts have been made to advance unbalanced data classifiers due to their significance and data classification complexity [12].Cluster analysis is the ideal method for database analysis since the average values of a database that contains variable and considerably varied data cannot be considered exemplary [13].
Clustering is a method or algorithm for grouping data or objects based on similarities [14].This is data mining or part of data mining to obtain compelling patterns within voluminous data.This method can be used to classify data on high school students affected by smartphone use on ear health.Hence, in classifying cluster categories, the K-medoid clustering technique is considered [15], [16].The presence of K-Medoids minimizes the overall distance between objects in each cluster.There are two stages in the algorithm's progression.Initially, it is obtained by finding k representative objects through an iterative selection of representative objects [17].Both algorithm operation of K-Medoids clustering and K-Means clustering is initiated by randomly selecting k starting medoids representing the k clusters [18].
As the most significant unsupervised learning problem, clustering, like all other problems of this type, concerns identifying a structure in a set of unlabeled data.Clustering can be considered the most important unsupervised learning problem, so, as with every other problem of this kind, it deals with finding a structure in a collection of unlabeled data.Broadly, clustering can also be defined as "The process of grouping objects into groups whose members are related in some way."On the other hand, it can be characterized as a group of objects that are "similar" to one another and "dissimilar" to those found in other clusters.In addition, clustering contrasts with classification, where the objects are assigned to predetermined classes.The main advantage of clustering is that it allows for the direct discovery of compelling patterns and structures from massive data sets with little to no prior knowledge.Thus, the results are arbitrary and reliant on the implementation [19].Clustering in text summarization aims to identify the significant subjects and subtopics in the document or collection of documents [20].
The reduced confusion matrix, a unique matrix, is the outcome of class grouping using the reduction method [21].The number of instances assigned to each class can be seen in the confusion matrix, and the purpose is to calculate the classification accuracy [22].To determine the final classification result, the outputs of various classifications are arranged in a decision-level fusion system by their confusion matrices [23].Confusion matrix is considered an effective technique for assessing how well a classifier can distinguish between tuples of various classes, which is typically used to quantify the accuracy of a classifier or predictor [24].

A. Data Collections
The data collection was started by screening the student.The screening was done by asking the students to fill out the questionnaire.The questionnaire topics are about age, study habits, duration of smartphone usage, and visual ability.Based on those questionnaires, the medical staff decided on students who continue to test with audiograms.Mostly, the selected students must use a smartphone for more than two hours and have visual ability problems.
The audiometry test was located in "UPTD Keselamatan dan Kesehatan Kerja" Padang, which is a unit or division responsible for implementing and overseeing workplace safety and health measures within a specific region or jurisdiction.The test is supervised by an otolaryngologist who treats issues in the ears, nose, and throat (ENT Doctor).Based on the first screening, 210 recorded audiometry were achieved.Finally, 93 audiometry data is selected as the dataset for training data and testing the model.An otolaryngologist supervises the selection process.

B. K-Medoids
The K-Medoids Algorithm is one of the partitional clustering techniques to reduce the distance between a cluster's labeled points and its designated center.The K-Medoids algorithm differs significantly from the K-Means algorithm in several ways, as it selects data points as the center (medoids).Therefore, this study is carried out using the purity value in different data formats to evaluate the K-Medoid algorithm but with alternative data formats in order to determine better clustering outcomes (from several different data formats) [25].
The K-Medoids algorithm is employed to locate Medoids in a cluster at its central location [26].As a classic partitioning technique or clustering method, K-Medoids divide object datasets into k groups based on prior knowledge.In a comparison of K-Medoids with K-Means, it was found that K-Medoids are stronger at handling noise (noise) as well as outliers (outliers) due to the reduction on some paired dissimilarities, instead of the sum of squares of Euclidean distances.A medoid is located in the cluster's center and can also be viewed as a cluster or as an entity in a cluster with the smallest average difference between all objects.The flowchart of the K-Medoids algorithm can be seen in Figure 1.
Euclidean Distance size equation is applied to allocate each data (object) to the closest cluster [27] with the equation: Description: , = Euclidean Distance between the i-th observation of the j-th variable to the center of the k-th cluster on the j-th variable = object on the i-th observation on the j-th variable = center of the k-th group on the j-th variable = the multiplicity of observed variables = the abundance of observations observed

C. Confusion Matrix
We employ the Confusion Matrix method to evaluate the classification system's effectiveness in accounting for the accuracy of the results from the two devices [28].In a multiclass classification task, when the confusion matrix is utilized as a tool to assess the performance of each instance labeled as one class by quantifying the classification overlap, it becomes prominent [29].
In the confusion matrix, the performance of machine learning classification models is commonly measured in tables and describes in detail whether the data set is correct or incorrect.This method is also one of the predictive analytics tools that displays and compares real or real values with the values of predictive models and generates matrices such as Accuracy, Precision, Recall, and F1-Score or F-Measure.
1) Accuracy: Accuracy is a test method determined by the proximity degree of the projected value and the actual value, and its prediction results can be measured when the correct classified data set is established.Accuracy involves the degree of similar results even after experiencing frequent tests and is not a category for boundary values [30].
2) Precision: Precision is a test method to determine the relevance of information received and obtained from the system by comparing the total amount of information from the two sources.
3) Recall: Recall is a test method carried out by comparing the total amount of applicable information received with the information in the information set (both for retained and discarded information).

4) F1-Score or F-Measure:
The F1-Score value, codenamed F-Measure, is determined by looking at the expected and actual result categories regarding precision and recall results.= the number of class 0 documents that are misclassified as class 1 FN (False Negative) = the number of class 1 documents that are misclassified as class 0 The confusion matrix formula for calculating accuracy, precision, and recall is as follows.

III. RESULT AND DISCUSSIONS
The sample in this research is part of the population determined from the best senior high schools and least favorite senior high school clusters.Based on the cluster, there were six senior high schools in Padang: SMA 1, SMA 3, SMA 5, SMA 6, SMA 8, and SMA 10.Samples were taken by using accidental sampling in each school, which became the research target of as many as 35 male and female students with a total sample of 210.The 210 total samples were renetted to see the behavior and intensity of the smartphone used.For the results obtained in more than 4 hours, measurement was carried out because it was in the heavy category with a total sample of 93 students.
Figures 5 and 6 show the difference between the results obtained using the RapidMiner application and the system built with cluster 0 (normal hearing ability category), cluster 1 (mild hearing loss category), and cluster 2 (moderate hearing loss category).Figs.7 and 8 show the difference between the results obtained using the RapidMiner application and the system built with cluster 0 (normal hearing ability category), cluster 1 (mild hearing loss category), and cluster 2 (moderate hearing loss category).The figures above show a difference between the results obtained using the RapidMiner application and the system built with cluster 0 (normal hearing ability category), cluster 1 (mild hearing loss category), and cluster 2 (moderate hearing loss category).

E. Confusion Matrix Testing
Based on tests conducted using the K-Medoids method, the confusion matrix is employed to compute the method's performance.In the data source, there are left and right interpretations.The interpretation has values of R (Normal Hearing Ability), N (Mild Hearing Loss), and T (Moderate Hearing Loss), where the values of R and T are initialized to be negative and the value of N is initialized to be positive in the Confusion Matrix.Likewise, in the clustering data, categories R (Normal), N (Mild), and T (Moderate), where categories R and T are initialized to be negative and category N is initialized to be positive in the Confusion Matrix, as seen in Table 15.Below is the result of calculating the accuracy value.Using the confusion matrix method, an accuracy value of 28.3%, a precision value of 64.3%, and a recall value of 21.4% were obtained.

IV. CONCLUSION
This research found that with the K-Medoids system, schools can see student data that has an impact on the ears.The K-Medoids method can be applied in this system to determine the type of students' ears.The results of clustering depend on the comparison of the difference between the data that is iterated and the medoid data that has been selected, provided that the data to be iterated is no longer derived from the medoid data that has been selected for the first time.From the data of several high school students in Padang, none of them are included in clusters 4 (moderately severe hearing loss), 5 (severe hearing loss), and 6 (profound hearing loss), meaning that none of the students have very serious ear disorders.In the data, students from SMA 3, SMA 8, and SMA 10 cannot cluster because the minimum and maximum values were the same at the time the data was transformed.So, it must be normalized and continued.

Fig. 3 Fig. 4
Fig. 3 Category PageB.Student Detail PageFigure2shows the student's identity along with the results of the audiogram examination and interpretation.

4 = 5 =
minimal data from all data per column 4"max data from all data per column

Table 6
shows normalized transformation data, and table7shows advanced normalized transformation data.