Case Study: Using Data Mining to Predict Student Performance Based on Demographic Attributes

Nursyuhadah Alghazali Binti Muhammad Zahruddin - National Defense University of Malaysia, Sungai Besi, Kuala Lumpur, Malaysia
Nur Diyana Kamarudin - National Defense University of Malaysia, Sungai Besi, Kuala Lumpur, Malaysia
Ruzanna Mat Jusoh - National Defense University of Malaysia, Sungai Besi, Kuala Lumpur, Malaysia
Nur Aisyah Abdul Fataf - National Defense University of Malaysia, Sungai Besi, Kuala Lumpur, Malaysia
Rahmat Hidayat - Politeknik Negeri Padang, West Sumatera, Indonesia

Citation Format:



This study predicts student performance at Universiti Pertahanan Nasional Malaysia (UPNM) based on their socio-demographic profile; it also determines how a prediction algorithm can be used to classify the student data for the most significant demographic attributes. The analytical pattern in academic results per batch has been identified using demographic attributes and the student's grades to improve short-term and long-term learning and teaching plans. Understanding the likely outcome of the education process based on predictions can help UPNM lecturers enhance the achievements of the subsequent batch of students by modifying the factors contributing to the prior success. This study identifies and predicts student performance using data mining and classification techniques such as decision trees, neural networks, and k-nearest neighbors. This frequently adopted method comprises data selection and preparation, cleansing, incorporating previous knowledge datasets, and interpreting precise solutions. This study presents the simplified output from each data mining method to facilitate a better understanding of the result and determine the best data mining method. The results show that the critical attributes influencing student performance are gender, age, and student status. The Neural Networks method has the lowest Root of the Mean of the Square of Errors (RMSE) for accuracy measurement. In contrast, the decision tree method has the highest RMSE, which indicates that the decision tree method has a lower performance accuracy. Moreover, the correlation coefficient for the k-nearest neighbor has been recorded as less than one.


Demographic profiling; student performance prediction; UPNM; WEKA; data mining; knowledge discovery database

Full Text:



Q. A. Al-Radaideh, E. M. Al-Shawakfa, and M. I. Al-Najjar, “Mining student data using decision trees,” in International Arab Conference on Information Technology (ACIT’2006), Yarmouk University, Jordan, 2006.

E. A. Amrieh, T. Hamtini, and I. Aljarah, “Mining Educational Data to Predict Student’s academic Performance using Ensemble Methods,” Int. J. Database Theory Appl., vol. 9, no. 8, pp. 119–136, Aug. 2016, doi: 10.14257/ijdta.2016.9.8.13.

S. Ayesha, T. Mustafa, A. R. Sattar, and M. I. Khan, “Data mining model for higher education system,” Eur. J. Sci. Res., vol. 43, no. 1, pp. 24–29, 2010.

B. K. Bhardwaj and S. Pal, “Data Mining: A prediction for performance improvement using classification,” arXiv Prepr. arXiv1201.3418, 2012.

Z. N. Khan, “Scholastic Achievement of Higher Secondary Students in Science Stream.,” Online Submiss., vol. 1, no. 2, pp. 84–87, 2005.

V. Ramesh, P. Parkavi, and K. Ramar, “Predicting Student Performance: A Statistical and Data Mining Approach,” Int. J. Comput. Appl., vol. 63, no. 8, pp. 35–39, Feb. 2013, doi: 10.5120/10489-5242.

F. Yang and F. W. B. Li, “Study on student performance estimation, student progress analysis, and student potential prediction based on data mining,” Comput. Educ., vol. 123, pp. 97–108, Aug. 2018, doi:10.1016/j.compedu.2018.04.006.

W. Xing, R. Guo, E. Petakovic, and S. Goggins, “Participation-based student final performance prediction model through interpretable Genetic Programming: Integrating learning analytics, educational data mining and theory,” Comput. Human Behav., vol. 47, pp. 168–181, Jun. 2015, doi: 10.1016/j.chb.2014.09.034.

N. Tomasevic, N. Gvozdenovic, and S. Vranes, “An overview and comparison of supervised data mining techniques for student exam performance prediction,” Comput. Educ., vol. 143, p. 103676, Jan. 2020, doi: 10.1016/j.compedu.2019.103676.

M. Pandey and S. Taruna, “Towards the integration of multiple classifier pertaining to the Student’s performance prediction,” Perspect. Sci., vol. 8, pp. 364–366, Sep. 2016, doi:10.1016/j.pisc.2016.04.076.

K. David Kolo, S. A. Adepoju, and J. Kolo Alhassan, “A Decision Tree Approach for Predicting Students Academic Performance,” Int. J. Educ. Manag. Eng., vol. 5, no. 5, pp. 12–19, Oct. 2015, doi:10.5815/ijeme.2015.05.02.

A. Gonzalez-Nucamendi, J. Noguez, L. Neri, V. Robledo-Rella, R. M. G. García-Castelán, and D. Escobar-Castillejos, “The prediction of academic performance using engineering student’s profiles,” Comput. Electr. Eng., vol. 93, p. 107288, Jul. 2021, doi:10.1016/j.compeleceng.2021.107288.

W. F. W. Yaacob, S. A. M. Nasir, W. F. W. Yaacob, and N. M. Sobri, “Supervised data mining approach for predicting student performance,” Indones. J. Electr. Eng. Comput. Sci., vol. 16, no. 3, p. 1584, Dec. 2019, doi: 10.11591/ijeecs.v16.i3.pp1584-1592.

J. Malini and Y. Kalpana, “Investigation of factors affecting student performance evaluation using education materials data mining technique,” Mater. Today Proc., vol. 47, pp. 6105–6110, 2021, doi:10.1016/j.matpr.2021.05.026.

A. Khan, S. K. Ghosh, D. Ghosh, and S. Chattopadhyay, “Random wheel: An algorithm for early classification of student performance with confidence,” Eng. Appl. Artif. Intell., vol. 102, p. 104270, Jun. 2021, doi: 10.1016/j.engappai.2021.104270.

B. K. Baradwaj and S. Pal, “Mining educational data to analyze students’ performance,” arXiv Prepr. arXiv1201.3417, 2012.

F. Widyahastuti and V. U. Tjhin, “Predicting students performance in final examination using linear regression and multilayer perceptron,” in 2017 10th International Conference on Human System Interactions (HSI), IEEE, Jul. 2017, pp. 188–192. doi: 10.1109/HSI.2017.8005026.

I. T. Riyadi Yanto, E. Sutoyo, A. Rahman, R. Hidayat, A. A. Ramli, and M. F. M. Fudzee, “Classification of Student Academic Performance using Fuzzy Soft Set,” in 2020 International Conference on Smart Technology and Applications (ICoSTA), IEEE, Feb. 2020, pp. 1–6. doi: 10.1109/ICoSTA48221.2020.1570606632.

K. M. Hamdan, A. M. Al-Bashaireh, Z. Zahran, A. Al-Daghestani, S. AL-Habashneh, and A. M. Shaheen, “University students’ interaction, Internet self-efficacy, self-regulation and satisfaction with online education during pandemic crises of COVID-19 (SARS-CoV-2),” Int. J. Educ. Manag., vol. 35, no. 3, pp. 713–725, Apr. 2021, doi:10.1108/IJEM-11-2020-0513.

N. B. Pokhrel, R. Khadayat, and P. Tulachan, “Depression, anxiety, and burnout among medical students and residents of a medical school in Nepal: a cross-sectional study,” BMC Psychiatry, vol. 20, no. 1, p. 298, Dec. 2020, doi: 10.1186/s12888-020-02645-6.

F. Giannakas, C. Troussas, I. Voyiatzis, and C. Sgouropoulou, “A deep learning classification framework for early prediction of team-based academic performance,” Appl. Soft Comput., vol. 106, p. 107355, Jul. 2021, doi: 10.1016/j.asoc.2021.107355.

A. Alhadabi and A. C. Karpinski, “Grit, self-efficacy, achievement orientation goals, and academic performance in University students,” Int. J. Adolesc. Youth, vol. 25, no. 1, pp. 519–535, Dec. 2020, doi:10.1080/02673843.2019.1679202.

H. Wu, S. Li, J. Zheng, and J. Guo, “Medical students’ motivation and academic performance: the mediating roles of self-efficacy and learning engagement,” Med. Educ. Online, vol. 25, no. 1, Jan. 2020, doi: 10.1080/10872981.2020.1742964.

H. A. Mengash, “Using Data Mining Techniques to Predict Student Performance to Support Decision Making in University Admission Systems,” IEEE Access, vol. 8, pp. 55462–55470, 2020, doi:10.1109/ACCESS.2020.2981905.

H.-B. Ly, T.-A. Nguyen, H.-V. Thi Mai, and V. Q. Tran, “Development of deep neural network model to predict the compressive strength of rubber concrete,” Constr. Build. Mater., vol. 301, p. 124081, Sep. 2021, doi: 10.1016/j.conbuildmat.2021.124081.

F. Granata and F. Di Nunno, “Neuroforecasting of daily streamflows in the UK for short- and medium-term horizons: A novel insight,” J. Hydrol., vol. 624, p. 129888, Sep. 2023, doi:10.1016/j.jhydrol.2023.129888.

Y. Xu, F. Li, and A. Asgari, “Prediction and optimization of heating and cooling loads in a residential building based on multi-layer perceptron neural network and different optimization algorithms,” Energy, vol. 240, p. 122692, Feb. 2022, doi:10.1016/

W. Samek, G. Montavon, S. Lapuschkin, C. J. Anders, and K.-R. Muller, “Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications,” Proc. IEEE, vol. 109, no. 3, pp. 247–278, Mar. 2021, doi: 10.1109/JPROC.2021.3060483.

S. Garg, S. Sinha, A. K. Kar, and M. Mani, “A review of machine learning applications in human resource management,” Int. J. Product. Perform. Manag., vol. 71, no. 5, pp. 1590–1610, May 2022, doi:10.1108/IJPPM-08-2020-0427.

S. Sharma, G. Singh, and M. Sharma, “A comprehensive review and analysis of supervised-learning and soft computing techniques for stress diagnosis in humans,” Comput. Biol. Med., vol. 134, p. 104450, Jul. 2021, doi: 10.1016/j.compbiomed.2021.104450.

J. Choi, B. Gu, S. Chin, and J.-S. Lee, “Machine learning predictive model based on national data for fatal accidents of construction workers,” Autom. Constr., vol. 110, p. 102974, Feb. 2020, doi:10.1016/j.autcon.2019.102974.

P. Cunningham and S. J. Delany, “k-Nearest Neighbour Classifiers - A Tutorial,” ACM Comput. Surv., vol. 54, no. 6, pp. 1–25, Jul. 2022, doi: 10.1145/3459665.

M. B. Cohen, B. T. Fasy, G. L. Miller, A. Nayyeri, D. R. Sheehy, and A. Velingker, “Approximating Nearest Neighbor Distances,” 2015, pp. 200–211. doi: 10.1007/978-3-319-21840-3_17.