Feature Selection Technique to Improve the Instances Classification Framework Performance for Quran Ontology

Yuli Purwati - Universitas AMIKOM Purwokerto, Purwokerto, Indonesia
Fandy Utomo - Universitas AMIKOM Purwokerto, Purwokerto, Indonesia
Nikmah Trinarsih - Universitas AMIKOM Purwokerto, Purwokerto, Indonesia
Hanif Hidayatulloh - Universitas AMIKOM Purwokerto, Purwokerto, Indonesia


Citation Format:



DOI: http://dx.doi.org/10.30630/joiv.7.2.1195

Abstract


The Al-Quran is the sacred book of Muslims, and it provides God's word in the form of orders, instructions, and guidelines for people to follow to have happy lives both here and in the afterlife. Several earlier research has used ontologies to store the knowledge found in the Quran. The previous study focused on extracting the relationship between classes and instances or the "is-a relation" by classifying instances based on the referenced class. Based on the performance testing of the instances classification framework, the test results show that Support Vector Machine (SVM) with Term Frequency-Inverse Document Frequency (TF-IDF) and stemming operation had dropped the accuracy value to 65.41% when the test data size was increased to 30%. Likewise, with BPNN with TF-IDF and stemming operations. In the Indonesian Quran translation dataset with a test data size of 30%, the accuracy value drops to 57.86%. Instances classification based on the thematic topics of the Qur'an aims to connect verses (instances) to topics (classes) to get an overall picture of the topic and provide a better understanding to users. This study aims to apply the feature selection technique to the instances classification framework for the Al-Quran ontology and to analyze the impact of applying the feature selection technique to the framework with a small dataset and training data. The instances classification framework in this study consists of several stages: text-preprocessing, feature extraction, feature selection, and instances classification. We applied Chiq-Square as a technique to perform feature selection. SVM and BPNN as a classifier. Based on the experiment results, it can be concluded that the feature selection implementation using Chi-Square increases the value of precision, f-measure, and accuracy on the test data size from 40% to 60% in all datasets. The feature selection using Chi-Square and SVM classifier provides the highest precision value with a test data size of 60% on the Tafsir Quran dataset from the Ministry of Religious Affairs Indonesia: 64.36%. Furthermore, the feature selection implementation and BPNN classifier also increase the highest accuracy value with a test data size of 60% in the Quranic Tafsir dataset from the Ministry of Religion of the Republic of Indonesia: 63.09%.


Keywords


Ontology population; Chi-Square; machine learning; SVM; BPNN

Full Text:

PDF

References


N. Suryana, F. S. Utomo, and M. S. Azmi, “Quran Ontology: Review on Recent Development and Open Research Issues,†Journal of Theoretical and Applied Information Technology, vol. 96, no. 3, pp. 568–581, 2018.

P. Cimiano, Ontology Learning and Population from Text, 1st ed. New York, NY: Springer US, 2006.

R. Witte, R. Krestel, T. Kappler, and P. C. Lockemann, “Converting a Historical Architecture Encyclopedia into a Semantic Knowledge Base,†IEEE Intelligent Systems, vol. 25, no. 1, pp. 58–67, 2010.

J. A. Reyes and A. Montes, “Learning Discourse Relations from News Reports: An Event-driven Approach,†IEEE Latin America Transactions, vol. 14, no. 1, pp. 356–363, 2016.

F. S. Utomo, N. Suryana, and M. S. Azmi, “Stemming impact analysis on Indonesian Quran translation and their exegesis classification for ontology instances,†IIUM Engineering Journal, vol. 21, no. 1, pp. 33–50, 2020.

R. Ismail, Z. Abu Bakar, and N. Abd Rahman, “Extracting knowledge from english translated quran using NLP pattern,†Jurnal Teknologi, vol. 77, no. 19, pp. 67–73, 2015.

N. K. Farooqui and M. F. Noordin, “Knowledge exploration: Selected works on Quran ontology development,†Journal of Theoretical and Applied Information Technology, vol. 72, no. 3, pp. 385–393, 2015.

S. K. Hamed and M. J. Ab Aziz, “A question answering system on Holy Quran translation based on question expansion technique and Neural Network classification,†Journal of Computer Science, vol. 12, no. 3, pp. 169–177, 2016.

B. Baharudin, L. H. Lee, K. Khan, and A. Khan, “A Review of Machine Learning Algorithms for Text-Documents Classification,†Journal of Advances in Information Technology, vol. 1, no. 1, pp. 4–20, 2010.

A. Adeleke, N. A. Samsudin, Z. A. Othman, and S. K. Ahmad Khalid, “A two-step feature selection method for quranic text classification,†Indonesian Journal of Electrical Engineering and Computer Science, vol. 16, no. 2, pp. 730–736, 2019.

A. Adeleke, N. Samsudin, A. Mustapha, and S. Ahmad Khalid, “Automating quranic verses labeling using machine learning approach,†Indonesian Journal of Electrical Engineering and Computer Science, vol. 16, no. 2, pp. 925–931, 2019.

A. Adeleke and N. Samsudin, “A Hybrid Feature Selection Technique for Classification of Group-based Holy Quran Verses,†International Journal of Engineering & Technology, vol. 7, no. 4.31, pp. 228–233, 2018.

A. O. Adeleke, N. A. Samsudin, A. Mustapha, and N. M. Nawi, “A group-based feature selection approach to improve classification of Holy Quran verses,†Advances in Intelligent Systems and Computing, vol. 700, pp. 282–297, 2018.

F. Z. Tala, “A Study of Stemming Effect on Information Retrieval in Bahasa Indonesia,†Universiteit van Amsterdam, 2003.

J. Asian, “Effective Techniques for Indonesian Text Retrieval,†RMIT University, 2007.

R. Kusumaningrum, S. Adhy, and S. Suryono, “WCLOUDVIZ: Word Cloud Visualization of Indonesian News Articles Classification based on Latent Dirichlet Allocation,†TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 16, no. 4, pp. 1752–1759, 2018.

I. G. M. Darmawiguna, G. A. Pradnyana, and G. S. Santyadiputra, “The Development of Integrated Bali Tourism Information Portal using Web Scrapping and Clustering Methods,†Journal of Physics: Conference Series, vol. 1165, no. 1, pp. 1–10, 2019.

A. Z. Arifin, I. P. A. D. Mahendra, and H. T. Ciptaningtyas, “Enhanced Confix Stripping Stemmer and Ants Algorithm For Classifying News Document in Representation of Textual,†in The 5th International Conference on Information & Communication Technology and Systems, pp. 149–158, 2009.

M. J. Schneider and S. Gupta, “Forecasting sales of new and existing products using consumer reviews: A random projections approach,†International Journal of Forecasting, vol. 32, no. 2, pp. 243–256, 2016.

G. Chen and L. Xiao, “Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods,†Journal of Informetrics, vol. 10, no. 1, pp. 212–223, 2016.

P. Agrawal, H. F. Abutarboush, T. Ganesh, and A. W. Mohamed, “Metaheuristic algorithms on feature selection: A survey of one decade of research (2009-2019),†IEEE Access, vol. 9, pp. 26766–26791, 2021.

C. P. Vandana and A. A. Chikkamannur, “Feature selection: An empirical study,†International Journal of Engineering Trends and Technology, vol. 69, no. 2, pp. 165–170, 2021.

M. Qaraad, S. Amjad, I. I. M. Manhrawy, H. Fathi, B. A. Hassan, and P. El Kafrawy, “A Hybrid Feature Selection Optimization Model for High Dimension Data Classification,†IEEE Access, vol. 9, pp. 42884–42895, 2021.

Q. Al-Tashi, S. J. Abdulkadir, H. M. Rais, S. Mirjalili, and H. Alhussian, “Approaches to Multi-Objective Feature Selection: A Systematic Literature Review,†IEEE Access, vol. 8, pp. 125076–125096, 2020.

H. Nematzadeh, R. Enayatifar, M. Mahmud, and E. Akbari, “Frequency based feature selection method using whale algorithm,†Genomics, vol. 111, no. 6, pp. 1946–1955, 2019.

L. Zhu, S. He, L. Wang, W. Zeng, and J. Yang, “Feature selection using an improved gravitational search algorithm,†IEEE Access, vol. 7, pp. 114440–114448, 2019.

H. Kang, G. Liu, Z. Wu, Y. Tian, and L. Zhang, “A Modified FlowDroid Based on Chi-Square Test of Permissions,†Entropy, vol. 23, no. 2, p. 174, 2021.

S. Bahassine, A. Madani, M. Al-sarem, and M. Kissi, “Feature selection using an improved Chi-square for Arabic text classification,†Journal of King Saud University - Computer and Information Sciences, vol. 32, no. 2, pp. 225–231, 2020.

Y. D. Setiyaningrum, A. F. Herdajanti, C. Supriyanto, and Muljono, “Classification of Twitter Contents using Chi-Square and K-Nearest Neighbour Algorithm,†in International Seminar on Application for Technology of Information and Communication (iSemantic), pp. 78–81, 2019.

S. T. Ikram and A. K. Cherukuri, “Intrusion detection model using fusion of chi-square feature selection and multi class SVM,†Journal of King Saud University - Computer and Information Sciences, vol. 29, no. 4, pp. 462–472, 2017.