Data Clustering for Identification of Building Conditions Using Hybrid Multivariate Multinominal Distribution Soft Set (MMDS) Method

Rohmat Saedudin - Department of Information Systems, Telkom University, Bandung, West Java, Indonesia
Iwan Tri Riyadi Yanto - Department of Information Systems, Universitas Ahmad Dahlan, Indonesia
Avon Budiono - Department of Information Systems, Telkom University, Bandung, West Java, Indonesia
Sely Novita Sari - Institute Teknologi Nasional Yogyakarta, Indonesia
Mustafa Mat Deris - Universiti Tun Hussein Onn Malaysia, Johor, Malaysia
Norhalina Senan - Universiti Tun Hussein Onn Malaysia, Johor, Malaysia

Citation Format:



Identifying building conditions for user safety is an urgent matter, especially in earthquake-prone areas. Clustering buildings according to their conditions in the categories of danger, vulnerable, normal, and safe is important information for residents and the government to take further action. This study introduces a new method, namely hybrid multivariate multinomial distribution with the softest (MMDS) in working on the process of clustering building conditions into the most appropriate category and comparable to the condition data presented in the building data set. Research using the MMDS method is very important to map the condition of existing buildings in an area supported by available data sets. The results of the measurements carried out can provide information related to the building index and were clustered based on the index value of the condition of the building. The dataset used in this study is data on school buildings in the West Java region. There are 286 school building data with four condition parameters: foundation, concrete reinforcement, easel pole, and roof. From existing data and defined condition parameters, buildings can be classified accurately and in proportion to the facts on the ground. This study also compared the proposed method, MMDS, with the baseline method, namely Fuzzy Centroid Clustering (FCC) and Fuzzy k-means Clustering (FKC). The results show that the proposed method is superior to the baseline method with a faster processing time


Clustering; Soft Set; Multivariate Multinomial Distribution

Full Text:



A. Barbaresi, M. Bovo, and D. Torreggiani, "The dual influence of the envelope on the thermal performance of conditioned and unconditioned buildings," Sustain. Cities Soc., vol. 61, p. 102298, 2020.

E. Harirchian, K. Jadhav, K. Mohammad, S. E. A. Hosseini, and T. Lahmer, "A comparative study of MCDM methods integrated with rapid visual seismic vulnerability assessment of existing RC structures," Appl. Sci., vol. 10, no. 18, 2020.

M. M. Kassem, F. Mohamed Nazri, and E. Noroozinejad Farsangi, "The seismic vulnerability assessment methodologies: A state-of-the-art review," Ain Shams Eng. J., vol. 11, no. 4, pp. 849–864, 2020.

A. Darko, A. P. C. Chan, Y. Yang, and M. O. Tetteh, "Building information modeling (BIM)-based modular integrated construction risk management – Critical survey and future needs," Comput. Ind., vol. 123, p. 103327, 2020.

C. Wan, M. Ye, C. Yao, and C. Wu, "Brain MR image segmentation based on Gaussian filtering and improved FCM clustering algorithm," in 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 2017, pp. 1–5.

R. R. Saedudin, S. B. Kasim, H. Mahdin, and M. A. Hasibuan, "Soft Set Approach for Clustering Graduated Dataset," in International Conference on Soft Computing and Data Mining, 2016, pp. 631–637.

R. R. Saedudin, S. B. Kasim, H. Mahdin, and M. A. Hasibuan, "Soft Set Approach for Clustering Graduated Dataset BT - Recent Advances on Soft Computing and Data Mining," 2017, pp. 631–637.

R. Shanker and M. Bhattacharya, "Brain Tumor Segmentation of Normal and Pathological Tissues Using K-mean Clustering with Fuzzy C-mean Clustering," in VipIMAGE 2017, 2018, pp. 286–296.

E. Sutoyo, I. T. R. Yanto, R. R. Saedudin, and T. Herawan, "A soft set-based co-occurrence for clustering web user transactions," Telkomnika (Telecommunication Comput. Electron. Control., vol. 15, no. 3, 2017.

A. S. M. S. Hossain, "Customer segmentation using centroid based and density based clustering algorithms," in 2017 3rd International Conference on Electrical Information and Communication Technology (EICT), 2017, pp. 1–6.

K. V Ahammed Muneer and K. Paul Joseph, "Performance Analysis of Combined k-mean and Fuzzy-c-mean Segmentation of MR Brain Images," in Computational Vision and Bio Inspired Computing, 2018, pp. 830–836.

H. Zhou, "K-Means Clustering BT - Learn Data Mining Through Excel: A Step-by-Step Approach for Understanding Machine Learning Methods," H. Zhou, Ed. Berkeley, CA: Apress, 2020, pp. 35–47.

S. Irfan, G. Dwivedi, and S. Ghosh, "Optimization of K-means clustering using genetic algorithm," in 2017 International Conference on Computing and Communication Technologies for Smart Nation (IC3TSN), 2017, pp. 156–161.

B. K. D. Prasad, B. Choudhary, and B. Ankayarkanni., "Performance Evaluation Model using Unsupervised K-Means Clustering," in 2020 International Conference on Communication and Signal Processing (ICCSP), 2020, pp. 1456–1458.

W. Wei, J. Liang, X. Guo, P. Song, and Y. Sun, "Hierarchical division clustering framework for categorical data," Neurocomputing, vol. 341, pp. 118–134, 2019.

Z. Huang, "Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values," Data Min. Knowl. Discov., vol. 2, no. 3, pp. 283–304, 1998.

Y. Xiao, C. Huang, J. Huang, I. Kaku, and Y. Xu, "Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering," Pattern Recognit., vol. 90, pp. 183–195, 2019.

D. B. M. Maciel, G. J. A. Amaral, R. M. C. R. de Souza, and B. A. Pimentel, "Multivariate fuzzy k-modes algorithm," Pattern Anal. Appl., vol. 20, no. 1, pp. 59–71, 2017.

P. S. Bishnu and V. Bhattacherjee, "Software cost estimation based on modified K-Modes clustering Algorithm," Nat. Comput., vol. 15, no. 3, pp. 415–422, 2016.

M. K. N. Huang, "A fuzzy k-modes algorithm for clustering categorical data - Fuzzy Systems, IEEE Transactions on," IEEE Trans. Fuzzy Syst., vol. 7, no. 4, pp. 446–452, 1999.

D.-W. Kim, K. H. Lee, and D. Lee, "Fuzzy clustering of categorical data using fuzzy centroids," Pattern Recognit. Lett., vol. 25, no. 11, pp. 1263–1271, Aug. 2004.

M. S. Yang, Y. H. Chiang, C. C. Chen, and C. Y. Lai, "A fuzzy k-partitions model for categorical data and its comparison to the GoM model," Fuzzy Sets Syst., vol. 159, no. 4, pp. 390–405, 2008.

S. Ben-David, D. Pál, and H. Simon, Stability of k-Means Clustering. 2007.

I. Landi, V. Mandelli, and M. V. Lombardo, "reval: a Python package to determine the best number of clusters with stability-based relative clustering validation," arXiv, vol. 2, no. 4. arXiv, p. 100228, 27-Aug-2020.

D. G. L. Allegretti, "Stability conditions, cluster varieties, and Riemann-Hilbert problems from surfaces," Adv. Math. (N. Y)., vol. 380, p. 107610, Mar. 2021.

E. Andreotti, D. Edelmann, N. Guglielmi, and C. Lubich, "Measuring the stability of spectral clustering," Linear Algebra Appl., vol. 610, pp. 673–697, Feb. 2021.

T. Herawan and M. M. Deris, "On Multi-soft Sets Construction in Information Systems BT - Emerging Intelligent Computing Technology and Applications. With Aspects of Artificial Intelligence," 2009, pp. 101–110.

I. T. R. Yanto, R. Setiyowati, M. M. Deris, and N. Senan, "Fast Hard Clustering Based on Soft Set Multinomial Distribution Function BT - Recent Advances in Soft Computing and Data Mining," 2022, pp. 3–13.

I. T. R. Yanto, M. M. Deris, and N. Senan, "PSS: New Parametric Based Clustering for Data Category BT - Recent Advances in Soft Computing and Data Mining," 2022, pp. 14–24.

I. Tri, R. Yanto, R. Saedudin, S. Novita, M. Mat, and N. Senan, "Soft Set Multivariate Distribution for Categorical Data Clustering," Int. J. Adv. Sci. Eng. Inf. Technol., vol. 11, no. 5, pp. 1841–1846, 2021.

I. Tri, R. Yanto, A. Apriani, R. Hidayat, M. Mat, and N. Senan, "Fast Clustering Environment Impact using Multi Soft Set Based on Multivariate Distribution," JOIV Int. J. Informatics Vis., vol. 5, no. September, pp. 291–297, 2021.

T. Herawan, M. M. Deris, and J. H. Abawajy, "Matrices Representation of Multi Soft-Sets and Its Application," in Computational Science and Its Applications -- ICCSA 2010: International Conference, Fukuoka, Japan, March 23-26, 2010, Proceedings, Part III, D. Taniar, O. Gervasi, B. Murgante, E. Pardede, and B. O. Apduhan, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 201–214.