Academic Performance Prediction Using Supervised Learning Algorithms in University Admission

Acep Irham Gufroni - Universitas Siliwangi, Tasikmalaya, Indonesia
Purwanto Purwanto - Universitas Diponegoro, Semarang, Indonesia
Farikhin Farikhin - Universitas Diponegoro, Semarang, Indonesia


Citation Format:



DOI: http://dx.doi.org/10.62527/joiv.9.1.2974

Abstract


Each educational institution has designed an academic system with the aim of providing as perfect learning as possible to students. The quality of good students is influenced by various factors, one of which is the available academic system. Previous research has shown that the quality of a student, which can be called academic achievement, can be determined through historical data on the student admission process. This research aims to process one of the admission processes previously implemented in Indonesian state universities using the National Selection for State University Entrance (SNMPTN) data, combined with Cumulative Achievement Index (GPA) data, so that it can be processed using a machine learning model. The algorithm used to create the model is a Supervised Learning Classification algorithm, which includes a Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGB). The research was carried out in three schemes based on the percentages of training data and test data. The results obtained show that DT produces the highest accuracy and precision values, with an accuracy value of 0.79 and a precision value of 0.56, respectively. The XGB produces the highest recall and f1-score values, with a recall value of 0.35 and an f1-score value of 0.36. The model with the highest f1-score can be selected as the best model, namely, the model with the XGB algorithm on a 70%-30% train-test data scheme. The resulting model achieved a success rate of 77%.


Keywords


University admission; academic performance; prediction; machine learning; supervised learning

Full Text:

PDF

References


A. I. Adekitan and E. Noma-Osaghae, “Data mining approach to predicting the performance of first year student in a university using the admission requirements,” Educ Inf Technol (Dordr), vol. 24, no. 2, pp. 1527–1543, 2019, doi: 10.1007/s10639-018-9839-7.

D. J. Lemay, C. Baek, and T. Doleck, “Comparison of learning analytics and educational data mining: A topic modeling approach,” Computers and Education: Artificial Intelligence, vol. 2, p. 100016, Jan. 2021, doi: 10.1016/j.caeai.2021.100016.

E. Fernandes, M. Holanda, M. Victorino, V. Borges, R. Carvalho, and G. Van Erven, “Educational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil,” J Bus Res, vol. 94, pp. 335–343, Jan. 2019, doi:10.1016/j.jbusres.2018.02.012.

F. Yang and F. W. B. Li, “Study on student performance estimation, student progress analysis, and student potential prediction based on data mining,” Comput Educ, vol. 123, pp. 97–108, Aug. 2018, doi:10.1016/j.compedu.2018.04.006.

C. Romero and S. Ventura, “Educational data mining and learning analytics: An updated survey,” Wiley Interdiscip Rev Data Min Knowl Discov, vol. 10, no. 3, May 2020, doi: 10.1002/widm.1355.

N. A. Binti Muhammad Zahruddin, N. D. Kamarudin, R. Mat Jusoh, N. A. Abdul Fataf, and R. Hidayat, “Case Study: Using Data Mining to Predict Student Performance Based on Demographic Attributes,” JOIV : International Journal on Informatics Visualization, vol. 7, no. 4, p. 2460, Dec. 2023, doi: 10.62527/joiv.7.4.2454.

S. M. Hassan and M. S. Al-Razgan, “Pre-University Exams Effect on Students GPA: A Case Study in IT Department,” in Procedia Computer Science, Elsevier B.V., 2016, pp. 127–131. doi:10.1016/j.procs.2016.04.018.

H. A. Mengash, “Using data mining techniques to predict student performance to support decision making in university admission systems,” IEEE Access, vol. 8, pp. 55462–55470, 2020, doi:10.1109/access.2020.2981905.

E. Alshehri, H. Alhakami, A. Baz, and T. Alsubait, “A Comparison of EDM Tools and Techniques,” 2020. [Online]. Available: www.ijacsa.thesai.org

A. I. Adekitan, O. Salau, A. I. Ng, and ) Adekitan, “The impact of engineering students’ performance in the first three years on their graduation result using educational data mining,” 2019, doi:10.1016/j.heliyon.2019.

A. Khan and S. K. Ghosh, “Student performance analysis and prediction in classroom learning: A review of educational data mining studies,” Educ Inf Technol (Dordr), vol. 26, no. 1, pp. 205–240, Jan. 2021, doi: 10.1007/s10639-020-10230-3.

Z. Kanetaki, C. Stergiou, G. Bekas, C. Troussas, and C. Sgouropoulou, “A Hybrid Machine Learning Model for Grade Prediction in Online Engineering Education,” International Journal of Engineering Pedagogy, vol. 12, no. 3, pp. 4–23, 2022, doi:10.3991/IJEP.V12I3.23873.

B. Sravani and M. M. Bala, “Prediction of Student Performance Using Linear Regression,” 2020 International Conference for Emerging Technology (INCET), pp. 1–5, Jun. 2020, doi:10.1109/incet49848.2020.9154067.

S. D. A. Bujang et al., “Multiclass Prediction Model for Student Grade Prediction Using Machine Learning,” IEEE Access, vol. 9, pp. 95608–95621, 2021, doi: 10.1109/access.2021.3093563.

A. I. Gufroni, P. Purwanto, F. Farikhin, A. Wibowo, and B. Warsito, “Exploratory Data Analysis To Identify The Most Important Feature Of University Admission Test Criteria Using Random Forest And Neural Network Algorithm,” in 2021 5th International Conference on Informatics and Computational Sciences (ICICoS), IEEE, Sep. 2021, pp. 1–5. doi: 10.1109/ICICoS53627.2021.9651757.

W. F. W. Yaacob, S. A. M. Nasir, W. F. W. Yaacob, and N. M. Sobri, “Supervised data mining approach for predicting student performance,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 16, no. 3, pp. 1584–1592, 2019, doi:10.11591/ijeecs.v16.i3.pp1584-1592.

K. J. O. De Santos, A. G. Menezes, A. B. De Carvalho, and C. A. E. Montesco, “Supervised learning in the context of educational data mining to avoid university students dropout,” in Proceedings - IEEE 19th International Conference on Advanced Learning Technologies, ICALT 2019, IEEE, 2019, pp. 207–208. doi:10.1109/ICALT.2019.00068.

Z. Chen, F. Jiang, Y. Cheng, X. Gu, W. Liu, and J. Peng, “XGBoost Classifier for DDoS Attack Detection and Analysis in SDN-Based Cloud,” in Proceedings - 2018 IEEE International Conference on Big Data and Smart Computing, BigComp 2018, Institute of Electrical and Electronics Engineers Inc., May 2018, pp. 251–256. doi:10.1109/BigComp.2018.00044.

A. S. Hashim, W. A. Awadh, and A. K. Hamoud, “Student Performance Prediction Model based on Supervised Machine Learning Algorithms,” in IOP Conference Series: Materials Science and Engineering, IOP Publishing Ltd, Nov. 2020. doi: 10.1088/1757-899X/928/3/032019.

M. Imran, S. Latif, D. Mehmood, and M. S. Shah, “Student Academic Performance Prediction using Supervised Learning Techniques,” International Journal of Emerging Technologies in Learning (iJET), vol. 14, no. 14, p. 92, Sep. 2019, doi: 10.3991/ijet.v14i14.10310.

S. Nowozin, C. Rother, S. Bagon, T. Sharp, B. Yao, and P. Kohli, “Decision tree fields,” in 2011 International Conference on Computer Vision, IEEE, Sep. 2011, pp. 1668–1675. doi:10.1109/ICCV.2011.6126429.

V. Matzavela and E. Alepis, “Decision tree learning through a Predictive Model for Student Academic Performance in Intelligent M-Learning environments,” Computers and Education: Artificial Intelligence, vol. 2, Jan. 2021, doi: 10.1016/j.caeai.2021.100035.

H. Hamsa, S. Indiradevi, and J. J. Kizhakkethottam, “Student Academic Performance Prediction Model Using Decision Tree and Fuzzy Genetic Algorithm,” Procedia Technology, vol. 25, pp. 326–332, 2016, doi: 10.1016/j.protcy.2016.08.114.

M. Nachouki, E. A. Mohamed, R. Mehdi, and M. Abou Naaj, “Student course grade prediction using the random forest algorithm: Analysis of predictors’ importance,” Trends Neurosci Educ, vol. 33, Dec. 2023, doi: 10.1016/j.tine.2023.100214.

K. L. Lam et al., “Use of random forest analysis to quantify the importance of the structural characteristics of beta-glucans for prebiotic development,” Food Hydrocoll, vol. 108, no. March, p. 106001, 2020, doi: 10.1016/j.foodhyd.2020.106001.

D. A. Pisner and D. M. Schnyer, “Support vector machine,” in Machine Learning, Elsevier, 2020, pp. 101–121. doi: 10.1016/B978-0-12-815739-8.00006-7.

Z. Liu et al., “Dual-feature-embeddings-based semi-supervised learning for cognitive engagement classification in online course discussions,” Knowl Based Syst, vol. 259, Jan. 2023, doi:10.1016/j.knosys.2022.110053.

J. Velthoen, C. Dombry, J.-J. Cai, and S. Engelke, “Gradient boosting for extreme quantile regression,” Extremes (Boston), vol. 26, no. 4, pp. 639–667, Sep. 2023, doi: 10.1007/s10687-023-00473-x.

B. Quinto, Next-generation machine learning with spark: Covers XGBoost, LightGBM, Spark NLP, distributed deep learning with keras, and more. Apress Media LLC, 2020. doi: 10.1007/978-1-4842-5669-5.

H. Hapke and C. Nelson, Building Machine Learning Pipelines. Sebastopol, CA, USA: O’Reilly Media Inc., 2020.

O. Ifeanyichukwu, U. Christian.O., and O. Chidi.Obed., “Application of Three Probability Distributions to Justify Central Limit Theorem,” African Journal of Mathematics and Statistics Studies, vol. 6, no. 4, pp. 77–80, Nov. 2023, doi: 10.52589/ajmss-lhcuqzlf.

M. R. Islam, “Sample size and its role in Central Limit Theorem (CLT),” International Journal of Physics & Mathematics, 2018, doi:10.31295/ijpm.v1n1.42.

S. Caton and C. Haas, “Fairness in Machine Learning: A Survey,” ACM Comput Surv, vol. 56, no. 7, pp. 1–38, Jul. 2024, doi:10.1145/3616865.