Hierarchical and K-means Clustering in the Line Drawing Data Shape Using Procrustes Analysis

Ridho Ananda - Faculty of Industrial Engineering and Design, Institut Teknologi Telkom Purwokerto, Purwokerto, 53147, Indonesia
Agi Prasetiadi - Faculty of Informatic, Institut Teknologi Telkom Purwokerto, Purwokerto, 53147, Indonesia


Citation Format:



DOI: http://dx.doi.org/10.30630/joiv.5.3.532

Abstract


One of the problems in the clustering process is that the objects under inquiry are multivariate measures containing geometrical information that requires shape clustering. Because Procrustes is a technique to obtaining the similarity measure of two shapes, it can become the solution. Therefore, this paper tried to use Procrustes as the main process in the clustering method. Several algorithms proposed for the shape clustering process using Procrustes were namely hierarchical the goodness-of-fit of Procrustes (HGoFP), k-means the goodness-of-fit of Procrustes (KMGoFP), hierarchical ordinary Procrustes analysis (HOPA), and k-means ordinary Procrustes analysis (KMOPA). Those algorithms were evaluated using Rand index, Jaccard index, F-measure, and Purity. Data used was the line drawing dataset that consisted of 180 drawings classified into six clusters. The results showed that the HGoFP, KMGoFP, HOPA and KMOPA algorithms were good enough in Rand index, F-measure, and Purity with 0.697 as a minimum value. Meanwhile, the good clustering results in the Jaccard index were only the HGoFP, KMGoFP, and HOPA algorithms with 0.561 as a minimum value. KMGoFP has the worst result in the Jaccard index that is about 0.300. In the time complexity, the fastest algorithm is the HGoFP algorithm; the time complexity is 4.733. Based on the results, the algorithms proposed in this paper particularly deserve to be proposed as new algorithms to cluster the objects in the line drawing dataset. Then, the HGoFP is suggested clustering the objects in the dataset used.


Keywords


Hierarchical; K-means; clustering; data shape; Procrustes.

Full Text:

PDF

References


M. Pavithra and R. M. S. Parvathi, “A survey on clustering high dimensional data techniques,†Int. J. Appl. Eng. Res., vol. 12, no. 11, pp. 2893–2899, 2017.

R. Ananda, “Analisis Mutu Pendidikan Sekolah Menengah Atas Program Ilmu Alam di Jawa Tengah dengan Algoritme K-Means Terorganisir,†J. Informatics, Inf. Syst. Softw. Eng. Appl., vol. 2, no. 1, pp. 65–72, 2019, doi: 10.20895/inista.v2i1.97.

R. Ananda, “Silhouette Density Canopy K-Means for Mapping the Quality of Education Based on the Results of the 2019 National Exam in Banyumas Regency,†Khazanah Inform. J. Ilmu Komput. dan Inform., vol. 5, no. 2, pp. 158–168, 2019, doi: 10.23917/khif.v5i2.8375.

R. Ananda, M. Z. Naf’an, A. Beladina, and A. Burhanudin, “Sistem Rekomendasi Pemilihan Peminatan Menggunakan Density Canopy K-Means,†vol. 1, no. 1, pp. 19–25, 2017.

R. Adhitama, A. Burhanuddin, and R. Ananda, “Penentuan jumlah cluster ideal SMK di Jawa Tengah dengan Metode X-means clustering dan K-means clustering,†J. Inform. dan Komput., vol. 3, no. 1, pp. 1–5, 2020, doi: 10.33387/jiko.

R. Ananda and A. Z. Yamani, “JURNAL RESTI Penentuan Centroid Awal K-means pada proses Clustering Data Evaluasi,†J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 1, no. 10, pp. 544–550, 2021.

P. Govender and V. Sivakumar, Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019), vol. 11, no. 1. Turkish National Committee for Air Pollution Research and Control, 2020.

J. LukáÄ, B. MihalÄová, E. Manová, R. Kozel, Å . Vilamova, and K. ÄŒulková, “The position of the Visegrád countries by clustering methods based on indicator environmental performance index,†Ekol. Bratislava, vol. 39, no. 1, pp. 16–26, 2020, doi: 10.2478/eko-2020-0002.

U. R. Yelipe, S. Porika, and M. Golla, “An efficient approach for imputation and classification of medical data values using class-based clustering of medical records,†Comput. Electr. Eng., vol. 66, pp. 487–504, 2018, doi: 10.1016/j.compeleceng.2017.11.030.

A. A. H. Hassan, W. M. Shah, A. M. Husien, M. S. Talib, A. A. J. Mohammed, and M. F. Iskandar, “Clustering approach in wireless sensor networks based on K-means: Limitations and recommendations,†Int. J. Recent Technol. Eng., vol. 7, no. 6, pp. 119–126, 2019.

D. S. Wardiani and N. Merlina, “Implementasi Data Mining Untuk Mengetahui Manfaat Rptra Menggunakan Metode K-Means Clustering,†J. Pilar Nusa Mandiri, vol. 15, no. 1, pp. 125–132, 2019, doi: 10.33480/pilar.v15i1.403.

I. L. Dryden and K. V. Mardia, Statistical Shape Analysis with applications in R, 2nd ed. 2016.

T. Bakhtiar and Siswadi, “Orthogonal procrustes analysis: Its transformation arrangement and minimal distance,†Int. J. Appl. Math. Stat., vol. 20, no. M11, pp. 16–24, 2011.

T. Bakhtiar and Siswadi, “On The Symmetrical Property of Procrustes Measure of Distance,†vol. 99, no. 3, pp. 315–324, 2015.

Siswadi and T. Bakhtiar, “Goodness-of-fit of biplots via procrustes analysis,†Far East J. Math. Sci., vol. 52, no. 2, pp. 191–201, 2011.

Siswadi, T. Bakhtiar, and R. Maharsi, “Procrustes analysis and the goodness-of-fit of biplots: Some thoughts and findings,†Appl. Math. Sci., vol. 6, no. 69–72, pp. 3579–3590, 2012.

A. Muslim and T. Bakhtiar, “Variable selection using principal component and procrustes analyses and its application in educational data,†J. Asian Sci. Res., vol. 2, no. 12, pp. 856–865, 2012, [Online]. Available: http://www.aessweb.com/pdf-files/856-865.pdf.

R. Ananda, Siswadi, and T. Bakhtiar, “Goodness-of-Fit of the Imputation Data in Biplot Analysis,†Far East J. Math. Sci., vol. 103, no. 11, pp. 1839–1849, 2018, doi: 10.17654/ms103111839.

R. Ananda, A. R. Dewi, and N. Nurlaili, “a Comparison of Clustering By Imputation and Special Clustering Algorithms on the Real Incomplete Data,†J. Ilmu Komput. dan Inf., vol. 13, no. 2, pp. 65–75, 2020, doi: 10.21609/jiki.v13i2.818.

F. Novika and T. Bakhtiar, “The Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers Detection,†Int. J. Eng. Manag. Res. Page Number, no. 1, pp. 194–200, 2018, [Online]. Available: www.ijemr.net.

K. Iwata, Shape clustering as a type of procrustes analysis, vol. 11304 LNCS. Springer International Publishing, 2018.

G. Gan and E. A. Valdez, “Data Clustering with Actuarial Applications,†North Am. Actuar. J., vol. 24, no. 2, pp. 168–186, 2020, doi: 10.1080/10920277.2019.1575242.

E. Rendón et al., “A comparison of internal and external cluster validation indexes,†Appl. Math. Comput. Eng. - Am. Conf. Appl. Math. Am. 5th WSEAS Int. Conf. Comput. Eng. Appl. CEA’11, pp. 158–163, 2011.

M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On validation techniques,†J. Intell. Inf. Syst., vol. 17, no. 2, pp. 107–145, 2001.

W. M. Rand, “Objective criteria for the evaluation of clustering methods,†J. Am. Stat. Assoc., vol. 66, no. 336, pp. 846–850, 1971, doi: 10.1080/01621459.1971.10482356.

D. Steinley, “Properties of the Hubert-Arabie adjusted Rand index,†Psychol. Methods, vol. 9, no. 3, pp. 386–396, 2004, doi: 10.1037/1082-989X.9.3.386.

M. Taboga, Lectures on Probability Theory and Mathematical Statistics, 3rd ed. CreateSpace Independent Publishing Platform, 2017.