Hierarchical and K-means Clustering in the Line Drawing Data Shape Using Procrustes Analysis

Ridho Ananda; Agi Prasetiadi

doi:10.30630/joiv.5.3.532

Hierarchical and K-means Clustering in the Line Drawing Data Shape Using Procrustes Analysis

Ridho Ananda - Faculty of Industrial Engineering and Design, Institut Teknologi Telkom Purwokerto, Purwokerto, 53147, Indonesia
Agi Prasetiadi - Faculty of Informatic, Institut Teknologi Telkom Purwokerto, Purwokerto, 53147, Indonesia

Citation Format:

DOI: http://dx.doi.org/10.30630/joiv.5.3.532

Abstract

One of the problems in the clustering process is that the objects under inquiry are multivariate measures containing geometrical information that requires shape clustering. Because Procrustes is a technique to obtaining the similarity measure of two shapes, it can become the solution. Therefore, this paper tried to use Procrustes as the main process in the clustering method. Several algorithms proposed for the shape clustering process using Procrustes were namely hierarchical the goodness-of-fit of Procrustes (HGoFP), k-means the goodness-of-fit of Procrustes (KMGoFP), hierarchical ordinary Procrustes analysis (HOPA), and k-means ordinary Procrustes analysis (KMOPA). Those algorithms were evaluated using Rand index, Jaccard index, F-measure, and Purity. Data used was the line drawing dataset that consisted of 180 drawings classified into six clusters. The results showed that the HGoFP, KMGoFP, HOPA and KMOPA algorithms were good enough in Rand index, F-measure, and Purity with 0.697 as a minimum value. Meanwhile, the good clustering results in the Jaccard index were only the HGoFP, KMGoFP, and HOPA algorithms with 0.561 as a minimum value. KMGoFP has the worst result in the Jaccard index that is about 0.300. In the time complexity, the fastest algorithm is the HGoFP algorithm; the time complexity is 4.733. Based on the results, the algorithms proposed in this paper particularly deserve to be proposed as new algorithms to cluster the objects in the line drawing dataset. Then, the HGoFP is suggested clustering the objects in the dataset used.

Keywords

Hierarchical; K-means; clustering; data shape; Procrustes.

Full Text:

PDF

References

M. Pavithra and R. M. S. Parvathi, â€œA survey on clustering high dimensional data techniques,â€ Int. J. Appl. Eng. Res., vol. 12, no. 11, pp. 2893â€“2899, 2017.

R. Ananda, â€œAnalisis Mutu Pendidikan Sekolah Menengah Atas Program Ilmu Alam di Jawa Tengah dengan Algoritme K-Means Terorganisir,â€ J. Informatics, Inf. Syst. Softw. Eng. Appl., vol. 2, no. 1, pp. 65â€“72, 2019, doi: 10.20895/inista.v2i1.97.

R. Ananda, â€œSilhouette Density Canopy K-Means for Mapping the Quality of Education Based on the Results of the 2019 National Exam in Banyumas Regency,â€ Khazanah Inform. J. Ilmu Komput. dan Inform., vol. 5, no. 2, pp. 158â€“168, 2019, doi: 10.23917/khif.v5i2.8375.

R. Ananda, M. Z. Nafâ€™an, A. Beladina, and A. Burhanudin, â€œSistem Rekomendasi Pemilihan Peminatan Menggunakan Density Canopy K-Means,â€ vol. 1, no. 1, pp. 19â€“25, 2017.

R. Adhitama, A. Burhanuddin, and R. Ananda, â€œPenentuan jumlah cluster ideal SMK di Jawa Tengah dengan Metode X-means clustering dan K-means clustering,â€ J. Inform. dan Komput., vol. 3, no. 1, pp. 1â€“5, 2020, doi: 10.33387/jiko.

R. Ananda and A. Z. Yamani, â€œJURNAL RESTI Penentuan Centroid Awal K-means pada proses Clustering Data Evaluasi,â€ J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 1, no. 10, pp. 544â€“550, 2021.

P. Govender and V. Sivakumar, Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980â€“2019), vol. 11, no. 1. Turkish National Committee for Air Pollution Research and Control, 2020.

J. LukÃ¡Ä, B. MihalÄovÃ¡, E. ManovÃ¡, R. Kozel, Å . Vilamova, and K. ÄŒulkovÃ¡, â€œThe position of the VisegrÃ¡d countries by clustering methods based on indicator environmental performance index,â€ Ekol. Bratislava, vol. 39, no. 1, pp. 16â€“26, 2020, doi: 10.2478/eko-2020-0002.

U. R. Yelipe, S. Porika, and M. Golla, â€œAn efficient approach for imputation and classification of medical data values using class-based clustering of medical records,â€ Comput. Electr. Eng., vol. 66, pp. 487â€“504, 2018, doi: 10.1016/j.compeleceng.2017.11.030.

A. A. H. Hassan, W. M. Shah, A. M. Husien, M. S. Talib, A. A. J. Mohammed, and M. F. Iskandar, â€œClustering approach in wireless sensor networks based on K-means: Limitations and recommendations,â€ Int. J. Recent Technol. Eng., vol. 7, no. 6, pp. 119â€“126, 2019.

D. S. Wardiani and N. Merlina, â€œImplementasi Data Mining Untuk Mengetahui Manfaat Rptra Menggunakan Metode K-Means Clustering,â€ J. Pilar Nusa Mandiri, vol. 15, no. 1, pp. 125â€“132, 2019, doi: 10.33480/pilar.v15i1.403.

I. L. Dryden and K. V. Mardia, Statistical Shape Analysis with applications in R, 2nd ed. 2016.

T. Bakhtiar and Siswadi, â€œOrthogonal procrustes analysis: Its transformation arrangement and minimal distance,â€ Int. J. Appl. Math. Stat., vol. 20, no. M11, pp. 16â€“24, 2011.

T. Bakhtiar and Siswadi, â€œOn The Symmetrical Property of Procrustes Measure of Distance,â€ vol. 99, no. 3, pp. 315â€“324, 2015.

Siswadi and T. Bakhtiar, â€œGoodness-of-fit of biplots via procrustes analysis,â€ Far East J. Math. Sci., vol. 52, no. 2, pp. 191â€“201, 2011.

Siswadi, T. Bakhtiar, and R. Maharsi, â€œProcrustes analysis and the goodness-of-fit of biplots: Some thoughts and findings,â€ Appl. Math. Sci., vol. 6, no. 69â€“72, pp. 3579â€“3590, 2012.

A. Muslim and T. Bakhtiar, â€œVariable selection using principal component and procrustes analyses and its application in educational data,â€ J. Asian Sci. Res., vol. 2, no. 12, pp. 856â€“865, 2012, [Online]. Available: http://www.aessweb.com/pdf-files/856-865.pdf.

R. Ananda, Siswadi, and T. Bakhtiar, â€œGoodness-of-Fit of the Imputation Data in Biplot Analysis,â€ Far East J. Math. Sci., vol. 103, no. 11, pp. 1839â€“1849, 2018, doi: 10.17654/ms103111839.

R. Ananda, A. R. Dewi, and N. Nurlaili, â€œa Comparison of Clustering By Imputation and Special Clustering Algorithms on the Real Incomplete Data,â€ J. Ilmu Komput. dan Inf., vol. 13, no. 2, pp. 65â€“75, 2020, doi: 10.21609/jiki.v13i2.818.

F. Novika and T. Bakhtiar, â€œThe Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers Detection,â€ Int. J. Eng. Manag. Res. Page Number, no. 1, pp. 194â€“200, 2018, [Online]. Available: www.ijemr.net.

K. Iwata, Shape clustering as a type of procrustes analysis, vol. 11304 LNCS. Springer International Publishing, 2018.

G. Gan and E. A. Valdez, â€œData Clustering with Actuarial Applications,â€ North Am. Actuar. J., vol. 24, no. 2, pp. 168â€“186, 2020, doi: 10.1080/10920277.2019.1575242.

E. RendÃ³n et al., â€œA comparison of internal and external cluster validation indexes,â€ Appl. Math. Comput. Eng. - Am. Conf. Appl. Math. Am. 5th WSEAS Int. Conf. Comput. Eng. Appl. CEAâ€™11, pp. 158â€“163, 2011.

M. Halkidi, Y. Batistakis, and M. Vazirgiannis, â€œOn validation techniques,â€ J. Intell. Inf. Syst., vol. 17, no. 2, pp. 107â€“145, 2001.

W. M. Rand, â€œObjective criteria for the evaluation of clustering methods,â€ J. Am. Stat. Assoc., vol. 66, no. 336, pp. 846â€“850, 1971, doi: 10.1080/01621459.1971.10482356.

D. Steinley, â€œProperties of the Hubert-Arabie adjusted Rand index,â€ Psychol. Methods, vol. 9, no. 3, pp. 386â€“396, 2004, doi: 10.1037/1082-989X.9.3.386.

M. Taboga, Lectures on Probability Theory and Mathematical Statistics, 3rd ed. CreateSpace Independent Publishing Platform, 2017.

Username
Password
Remember me