Flexible Semantic Qur’an Question Answering Using Graph-Based Summarization and KNN
DOI: http://dx.doi.org/10.62527/joiv.8.4.1907
Abstract
Researchers in the computer science field have been attracted by Qur'an-based research. This research area focuses on representing the ontology-based Qur'an. A semantic-based search will be beneficial in extracting information from the Quran, which has complex knowledge and language. This work aims to develop flexible semantic Qur'an question-answering by applying graph-based summarization and K-nearest neighbors (KNN) methods to add flexibility to semantic-based searches in the Indonesian Language. Meanwhile, the Qur'an is based on a unique Arabic language. It is a part of the complexity of this work. The graph-based summarization method effectively summarizes a complex question. It was proved by ROUGE testing with F1, precision, and recall scores of 72%, 62%, and 72%, respectively. The KNN method evaluated by the expert resulted in an average approval percentage on the 1st, second, and third topics of 62.11%, 66.15%, and 19.61%. As for other issues related to the questions, 70% needs to be displayed. The analysis of the obtained result indicates that the classification step needs to be improved in the tiny dataset. This work will contribute to Qur'an Question Answering as it is considered that the Qur'an is a different object compared to the other content of Question Answering. The Qur'an is an object that contains a massive amount of multi-interpretation. Lots more work in the future. The dataset is also limited by the scope of the theme of this research, which is only the pillars of Islam, so many topics still need to be included in the dataset
Keywords
Full Text:
PDFReferences
M. Q. Shihab, Al-Quran dan Maknanya. Lentera Hati, 2020.
R. Malhas, W. Mansour, and T. Elsayed, “Qur’an QA 2022: Overview of the first shared task on question answering over the holy qur’an,” in Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur’an QA and Fine-Grained Hate Speech Detection, 2022, pp. 79–87.
A. Sleem, E. M. lotfy Elrefai, M. M. Matar, and H. Nawaz, “Stars at qur’an qa 2022: Building automatic extractive question answering systems for the holy qur’an with transformer models and releasing a new dataset,” in Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur’an QA and Fine-Grained Hate Speech Detection, 2022, pp. 146–153.
F. Beirade, H. Azzoune, and D. E. Zegour, “Semantic query for Quranic ontology,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 6, pp. 753–760, 2021.
E. H. Mohamed and E. M. Shokry, “QSST: A Quranic Semantic Search Tool based on word embedding,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 3, pp. 934–945, 2022.
M. Vargas-Vera and E. Motta, “AQUA–ontology-based question answering system,” in MICAI 2004: Advances in Artificial Intelligence: Third Mexican International Conference on Artificial Intelligence, Mexico City, Mexico, April 26-30, 2004. Proceedings 3, Springer, 2004, pp. 468–477.
F. Bendjamaa and T. Nora, “A Dialogue-System Using a Qur’anic Ontology,” in 2020 2nd International Conference on Embedded and Distributed Systems, EDiS 2020, 2020. doi: 10.1109/EDiS49545.2020.9296437.
Z. Sultana, M. M. Rahman, M. N. Uddin, and M. Arfat, “Developing a Semantic Search Method for Retrieving Food Related Verses and Concepts from Holy Quran Using Ontology,” in 2021 5th International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), IEEE, 2021, pp. 1–6.
A. Mostafa and O. Mohamed, “GOF at Qur’an QA 2022: Towards an Efficient Question Answering For The Holy Qu’ran In The Arabic Language Using Deep Learning-Based Approach,” in Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur’an QA and Fine-Grained Hate Speech Detection, 2022, pp. 104–111.
A. Hakkoum and S. Raghay, “Semantic Q&A System on the Qur’an,” Arab J Sci Eng, vol. 41, no. 12, pp. 5205–5214, 2016.
S.-C. Lin, J.-H. Yang, R. Nogueira, M.-F. Tsai, C.-J. Wang, and J. Lin, “Multi-stage conversational passage retrieval: An approach to fusing term importance estimation and neural query rewriting,” ACM Transactions on Information Systems (TOIS), vol. 39, no. 4, pp. 1–29, 2021.
D. Roy, D. Ganguly, M. Mitra, and G. J. F. Jones, “Estimating gaussian mixture models in the local neighbourhood of embedded word vectors for query performance prediction,” Inf Process Manag, vol. 56, no. 3, pp. 1026–1045, 2019.
H. Deng, M. Bui, N. Navab, L. Guibas, S. Ilic, and T. Birdal, “Deep bingham networks: Dealing with uncertainty and ambiguity in pose estimation,” Int J Comput Vis, vol. 130, no. 7, pp. 1627–1654, 2022.
G. Smits, O. Pivert, H. Jaudoin, and F. Paulus, “An autocompletion mechanism for enriched keyword queries to rdf data sources,” in Flexible Query Answering Systems: 10th International Conference, FQAS 2013, Granada, Spain, September 18-20, 2013. Proceedings 10, Springer, 2013, pp. 601–612.
G. Bordogna, P. Carrara, L. Frigerio, and S. Lella, “Flexible Trip-Planning Queries,” ISPRS Int J Geoinf, vol. 12, no. 5, p. 204, 2023.
J. Morrissey and R. Zhao, “R/quest: A question answering system,” in Flexible Query Answering Systems: 10th International Conference, FQAS 2013, Granada, Spain, September 18-20, 2013. Proceedings 10, Springer, 2013, pp. 79–90.
G. Besbes, H. Baazaoui-Zghal, and A. Moreno, “Ontology-based question analysis method,” in Flexible Query Answering Systems: 10th International Conference, FQAS 2013, Granada, Spain, September 18-20, 2013. Proceedings 10, Springer, 2013, pp. 100–111.
D. Ortiz-Arroyo, “Analysis of semantic networks using complex networks concepts,” in Flexible Query Answering Systems: 10th International Conference, FQAS 2013, Granada, Spain, September 18-20, 2013. Proceedings 10, Springer, 2013, pp. 134–142.
J. Ko, Y. Kook, and K. Shin, “Incremental lossless graph summarization,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 317–327.
T. Safavi, C. Belth, L. Faber, D. Mottin, E. Müller, and D. Koutra, “Personalized knowledge graph summarization: From the cloud to your pocket,” in 2019 IEEE International Conference on Data Mining (ICDM), IEEE, 2019, pp. 528–537.
J. A. P. Sacenti, R. Fileto, and R. Willrich, “Knowledge graph summarization impacts on movie recommendations,” J Intell Inf Syst, vol. 58, no. 1, pp. 43–66, 2022.
Z. Yong, L. Youwen, and X. Shixiong, “An improved KNN text classification algorithm based on clustering,” J Comput (Taipei), vol. 4, no. 3, pp. 230–237, 2009.
E. Adhim and D. Wardani, “Improving the Result of Question Answering System with Semantic Similarity Method Based on Hierarchy in Ontology,” in The 2021 International Conference on Computer, Control, Informatics and Its Applications, in {IC3INA} 2021. New York, NY, USA: Association for Computing Machinery, 2021, pp. 97–103. doi: 10.1145/3489088.3489095.
D. Wardani and C. Achmad, “SSTI: Semantic Similarity to detect Novelty of Thesis Ideas,” in Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications, 2022, pp. 376–381.
D. Wardani and M. Susmawati, “SESS: Utilization of SPIN for Ethnomedicine Semantic Search,” in Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications, 2022, pp. 153–157.
J. M. Kleinberg, “Hubs, authorities, and communities,” ACM Computing Surveys (CSUR), vol. 31, no. 4es, p. 5, 1999.
J. Fang and F. Y. Partovi, “A HITS-based model for facility location decision,” Expert Syst Appl, vol. 159, p. 113616, 2020.
L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank citation ranking: Bringing order to the Web,” in Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, 1998, pp. 161–172. [Online]. Available: citeseer.nj.nec.com/page98pagerank.html
R. Elbarougy, G. Behery, and A. El Khatib, “Extractive arabic text summarization using modified PageRank algorithm,” Egyptian Informatics Journal, vol. 21, no. 2, pp. 73–81, 2020.
D. Wardani and Y. Susanti, “Improving graph-based summarization with HTML tag and metadata features,” Engineering Letters, vol. 28, no. 2, pp. 522–528, 2020.
X. Zhang and H. Wu, “PageRank Algorithm and HITS Algorithm in Web Page Ranking,” in Application of Intelligent Systems in Multi-modal Information Analytics: 2021 International Conference on Multi-modal Information Analytics (MMIA 2021), Volume 1, Springer, 2021, pp. 389–395.
R. Mihalcea and P. Tarau, “Textrank: {Bringing} order into text,” in Proceedings of the 2004 conference on empirical methods in natural language processing, 2004, pp. 404–411.
Z. Huang and Z. Xie, “A patent keywords extraction method using {TextRank} model with prior public knowledge,” Complex & Intelligent Systems, vol. 8, no. 1, pp. 1–12, 2022.
M. Zhang, X. Li, S. Yue, and L. Yang, “An empirical study of {TextRank} for keyword extraction,” IEEE Access, vol. 8, pp. 178849–178858, 2020.
G. Erkan and D. R. Radev, “{LexRank}: {Graph}-based lexical centrality as salience in text summarization,” J. Artif. Intell. Res. (JAIR), vol. 22, pp. 457–479, 2004.
R. Ramesh and B. Rajan, “Extractive Text Summarization Using Graph Based Ranking Algorithm and Mean Shift Clustering,” in Proceedings of International Conference on Recent Trends in Computing, Communication & Networking Technologies (ICRTCCNT), 2019.
C. Y. Lin, “Rouge: {A} package for automatic evaluation of summaries,” in Proceedings of the workshop on text summarization branches out ({WAS} 2004), 2004.
W. Tay, A. Joshi, X. J. Zhang, S. Karimi, and S. Wan, “Red-faced rouge: Examining the suitability of rouge for opinion summary evaluation,” in Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association, 2019, pp. 52–60.
M. Akter, N. Bansal, and S. K. Karmaker, “Revisiting Automatic Evaluation of Extractive Summarization Task: Can We Do Better than ROUGE?,” in Findings of the Association for Computational Linguistics: ACL 2022, 2022, pp. 1547–1560.
T. Zerrouki, “PyArabic: A Python package for Arabic text,” J Open Source Softw, vol. 8, no. 84, p. 4886, 2023.
T. Zerrouki, “Qalsadi, arabic mophological analyzer library for python,” 2012.
Z. Chen, L. J. Zhou, X. Da Li, J. N. Zhang, and W. J. Huo, “The {Lao} text classification method based on {KNN},” Procedia Comput Sci, vol. 166, pp. 523–528, 2020.
K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A comparative analysis of logistic regression, random forest and KNN models for the text classification,” Augmented Human Research, vol. 5, pp. 1–16, 2020.
I. Handayani and others, “Application of K-nearest neighbor algorithm on classification of disk hernia and spondylolisthesis in vertebral column,” Indonesian Journal of Information Systems, vol. 2, no. 1, pp. 57–66, 2019.
M. J. Hasan, J. Kim, C. H. Kim, and J.-M. Kim, “Health state classification of a spherical tank using a hybrid bag of features and K-Nearest neighbor,” Applied Sciences, vol. 10, no. 7, p. 2525, 2020.
B. Venkataramanaiah and J. Kamala, “ECG signal processing and KNN classifier-based abnormality detection by VH-doctor for remote cardiac healthcare monitoring,” Soft comput, vol. 24, no. 22, pp. 17457–17466, 2020.
M. G. Poddar, A. C. Birajdar, J. Virmani, and others, “Automated classification of hypertension and coronary artery disease patients by PNN, KNN, and SVM classifiers using HRV analysis,” in Machine learning in Bio-signal analysis and diagnostic imaging, Elsevier, 2019, pp. 99–125.
D. T. Larose and C. D. Larose, Discovering knowledge in data: an introduction to data mining, vol. 4. John Wiley & Sons, 2014.
Y.-H. Chen, E. J.-L. Lu, and T.-A. Ou, “Intelligent SPARQL Query Generation for Natural Language Processing Systems,” IEEE Access, vol. 9, pp. 158638–158650, 2021.