A Comparative Study of Feature Selection Technique for Predicting the Professional Tennis Matches Outcome in a Grand Slam Tournament

Nur Ruslan - Universiti Pertahanan Nasional Malaysia, Sungai Besi Camp 57000 Kuala Lumpur, Malaysia
Zuraini Zainol - Universiti Pertahanan Nasional Malaysia, Sungai Besi Camp 57000 Kuala Lumpur, Malaysia
Ummul Abdul Rauf - Universiti Pertahanan Nasional Malaysia, Sungai Besi Camp 57000 Kuala Lumpur, Malaysia

Citation Format:

DOI: http://dx.doi.org/10.62527/joiv.8.1.2198


Tennis is one of the world's most played sports, attracting many spectators to participate in the game. One of the most essential strokes in a tennis match is serve performance. This research is intended to determine the most critical strokes in tennis serve performance in predicting the tennis match outcome. This research focuses on the Grand Slam Tournaments of the Australian Open, French Open, Wimbledon, and US Open. The data are collected on the tennis serve performances such as Percentage First Serve In (PFSI), Percentage First Serve Won (PFSW), Percentage First Serve Return Won (PFSRW), Aces, and many more. For one tournament, it consists of 254 observations. This study applied feature selection methods available in R programming, such as Correlation Matrix, Relative Importance Metrics, Boruta, MARS, and cForest. Selecting the most essential and correlated variables with the match status can improve the model and help produce better results. This might help the practitioners to apply this method to obtain the closest result to the actual outcome when we include the most correlated variables in the model. From the result obtained, variables of first and second serve, either win on serve or return serve, are identified as the most critical attributes in the tennis match. As a future implication, we suggest that these are all the factors the players need to pay extra attention to in winning the tennis match. 


Serve; Correlation Matrix; Boruta; MARS; Feature selection.

Full Text:



S. Das, “Top 10 Most Popular Sports in the World July 2022. Sports Browser.”

J. Wang and Y. Li, “STRENGTH TRAINING METHOD FOR TENNIS PLAYERS,” Revista Brasileira de Medicina do Esporte, vol. 29, 2023, doi: 10.1590/1517-8692202329012022_0632.

Z. Bilić, V. Dukarić, S. Šanjug, P. Barbaros, and D. Knjaz, “The Concurrent Validity of Mobile Application for Tracking Tennis Performance,” Applied Sciences (Switzerland), vol. 13, no. 10, May 2023, doi: 10.3390/app13106195.

K. Jung and H. Kim, “Comparison of the Tennis Serve Performance: A Case Study of an Elite Korean Tennis Player,” International Journal of Human Movement Science, vol. 16, no. 1, pp. 77–85, Apr. 2022, doi: 10.23949/ijhms.2022.

O. El Aissaoui, Y. El Alami El Madani, L. Oughdir, A. Dakkak, and Y. El Allioui, “A Multiple Linear Regression-Based Approach to Predict Student Performance,” in Advances in Intelligent Systems and Computing, Springer Science and Business Media Deutschland GmbH, 2020, pp. 9–23. doi: 10.1007/978-3-030-36653-7_2.

S. B. Sakri and Z. Ali, “Analysis of the Dimensionality Issues in House Price Forecasting Modeling,” in Proceedings - 2022 5th International Conference of Women in Data Science at Prince Sultan University, WiDS-PSU 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 13–19. doi: 10.1109/WiDS-PSU54548.2022.00015.

P. Khumprom, D. Grewell, and N. Yodo, “Deep neural network feature selection approaches for data-driven prognostic model of aircraft engines,” Aerospace, vol. 7, no. 9, Sep. 2020, doi: 10.3390/AEROSPACE7090132.

N. S. Harzevili, A. B. Belle, J. Wang, S. Wang, Z. M. Jiang, and N. Nagappan, “A Survey on Automated Software Vulnerability Detection Using Machine Learning and Deep Learning,” Jun. 2023, [Online]. Available: http://arxiv.org/abs/2306.11673

K.-L. Tsui, V. C. P. Chen, W. Jiang, and Y. A. Aslandogan, “Data Mining Methods and Applications,” 2023. [Online]. Available: www.selectron.com

Z. Zainol, M. T. H. Jaymes, and P. N. E. Nohuddin, “VisualUrText: A Text Analytics Tool for Unstructured Textual Data,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Jun. 2018. doi: 10.1088/1742-6596/1018/1/012011.

M. Y. Abdul Mutalib, Z. Zainol, U. F. Abdul Rauf, and P. N. Nohuddin, “PREDICTION ANALYSIS OF STUDENT ACADEMIC PERFORMANCE USING MyCGPA APPLICATION,” Journal of Information System and Technology Management, vol. 8, no. 31, pp. 80–94, Jun. 2023, doi: 10.35631/JISTM.831006.

A. M. Abdo, N. M. Ahmad Rasid, N. A. H. Mohd Badli, S. N. A. Sulaiman, S. Wani, and Z. Zainol, “Student’s Performance Based on E-Learning Platform Behaviour using K-means Clustering,” 2021.

N. K. W. Chan, A. S. H. Lee, and Z. Zainol, “Predicting Employee Health Risks using Classification Ensemble Model,” in Proceedings - CAMP 2021: 2021 5th International Conference on Information Retrieval and Knowledge Management: Digital Technology for IR 4.0 and Beyond, Institute of Electrical and Electronics Engineers Inc., Jun. 2021, pp. 52–58. doi: 10.1109/CAMP51653.2021.9498106.

U. Ahmed et al., “Prediction of Diabetes Empowered with Fused Machine Learning,” IEEE Access, vol. 10, pp. 8529–8538, 2022, doi: 10.1109/ACCESS.2022.3142097.

A. A. Ab Zayin, Z. Zainol, P. N. E. Nohuddin, and H. Mohamed, “PERAMALAN RISIKO DIABETES MENGUNAKAN APLIKASI MYDIABETICRISK ARTICLE INFO ABSTRACT,” Journal of Defence Science, Engineering & Technology Journal homepage, vol. 6, pp. 137–148, 2023, doi: 10.58247/jdset-2023-0602-13.

M. E. Lokanan and K. Sharma, “Fraud prediction using machine learning: The case of investment advisors in Canada,” Machine Learning with Applications, vol. 8, p. 100269, Jun. 2022, doi: 10.1016/j.mlwa.2022.100269.

X. Xu, F. Xiong, and Z. An, “Using Machine Learning to Predict Corporate Fraud: Evidence Based on the GONE Framework,” Journal of Business Ethics, Aug. 2022, doi: 10.1007/s10551-022-05120-2.

Z. Zainol, P. N. E. Nohuddin, A. S.-H. Lee, N. F. Ibrahim, L. H. Yee, and K. Abd Majid, “Analysing political candidates’ popularity on social media using POPularity MONitoring (POPMON),” SEARCH Journal of Media and Communication Research, no. GRACE 2020 Conference, pp. 39–55, 2021.

A. Hassani and E. Mosconi, “Social media analytics, competitive intelligence, and dynamic capabilities in manufacturing SMEs,” Technol Forecast Soc Change, vol. 175, Feb. 2022, doi: 10.1016/j.techfore.2021.121416.

P. K. Choudhary, S. Dubey, D. Brijwal, and R. Paswan, “A statistical model to predict the results of Novak Djokovic’s matches in the Australian open tennis event using the binary logistic regression,” International Journal of Statistics and Applied Mathematics, vol. 8, no. 1, pp. 17–21, Jan. 2023, doi: 10.22271/maths.2023.v8.i1a.921.

A. Cornman, G. Spellman, and D. Wright, “Machine Learning for Professional Tennis Match Prediction and Betting.”

G. C. Domínguez, E. F. Álvarez, A. T. Córdoba, and D. G. Reina, “A comparative study of machine learning and deep learning algorithms for padel tennis shot classification,” Soft comput, 2023, doi: 10.1007/s00500-023-07874-x.

Z. Gao and A. Kowalczyk, “Random forest model identifies serve strength as a key predictor of tennis match outcome,” Journal of Sports Analytics, vol. 7, no. 4, pp. 255–262, Jul. 2021, doi: 10.3233/jsa-200515.


Vincenzo Candila and L. Palazzo, “Neural networks and betting strategies for tennis,” Risks, vol. 8, no. 3, pp. 1–19, Sep. 2020, doi: 10.3390/risks8030068.

M. Sudhir, M. Gorade, A. Deo, and P. Purohit, “A Study of Some Data Mining Classification Techniques,” International Research Journal of Engineering and Technology, 2017, [Online]. Available: www.irjet.net

M. Skublewska-Paszkowska and P. Powroznik, “Temporal Pattern Attention for Multivariate Time Series of Tennis Strokes Classification,” Sensors, vol. 23, no. 5, Mar. 2023, doi: 10.3390/s23052422.

F. Shahrabi Farahani, M. Alavi, M. Ghasemi, and B. Teimourpour, “Scientific Map of Papers Related to Data Mining in Civilica Database Based on Co-Word Analysis.”

M. Makino, T. Odaka, J. Kuroiwa, I. Suwa, and H. Shirai, “Feature Selection to Win the Point of ATP Tennis Players Using Rally Information,” Int J Comput Sci Sport, vol. 19, no. 1, pp. 37–50, Jul. 2020, doi: 10.2478/ijcss-2020-0003.

J. C. Yue, E. P. Chou, M. H. Hsieh, and L. C. Hsiao, “A study of forecasting tennis matches via the Glicko model,” PLoS One, vol. 17, no. 4 April, Apr. 2022, doi: 10.1371/journal.pone.0266838.

S. Ghosh, S. Sadhu, S. Biswas, D. Sarkar, and P. P. Sarkar, “A comparison between different classifiers for tennis match result prediction,” Malaysian Journal of Computer Science, vol. 32, no. 2, pp. 97–111, 2019, doi: 10.22452/mjcs.vol32no2.2.

E. E. Ogheneovo and P. A. Nlerum, “Iterative Dichotomizer 3 (ID3) Decision Tree: A Machine Learning Algorithm for Data Classification and Predictive Analysis,” International Journal of Advanced Engineering Research and Science, vol. 7, no. 4, pp. 514–521, 2020, doi: 10.22161/ijaers.74.60.

S. L. Nesamani, S. N. S. Rajini, I. C. Figueroa Sánchez, M. D. P. M. Figueroa, D. A. Manrique De Lara Suárez, and O. F. C. Fuentes, “Predictive Modeling for Classification Of Breast Cancer Data Set Using Feature Selection Techniques,” 2021. [Online]. Available: https://orcid.org/0000-0003-

A. Juárez-López, J. Hernández-Torruco, B. Hernández-Ocaña, and O. Chávez-Bosquez, “Comparison of classification algorithms using feature selection,” in 2021 Mexican International Conference on Computer Science, ENC 2021, Institute of Electrical and Electronics Engineers Inc., Aug. 2021. doi: 10.1109/ENC53357.2021.9534831.

O. A. Montesinos-López et al., “Do feature selection methods for selecting environmental covariables enhance genomic prediction accuracy?,” Front Genet, vol. 14, Jul. 2023, doi: 10.3389/fgene.2023.1209275.

O. O. Oladimeji, A. Oladimeji, and O. Oladimeji, “Classification models for likelihood prediction of diabetes at early stage using feature selection,” Applied Computing and Informatics, May 2021, doi: 10.1108/aci-01-2021-0022.

N. Z. Abidin, A. R. Ismail, and N. A. Emran, “Performance analysis of machine learning algorithms for missing value imputation,” International Journal of Advanced Computer Science and Applications, vol. 9, no. 6, pp. 442–447, 2018, doi: 10.14569/IJACSA.2018.090660.

M. J. Azur, E. A. Stuart, C. Frangakis, and P. J. Leaf, “Multiple imputation by chained equations: What is it and how does it work?,” Int J Methods Psychiatr Res, vol. 20, no. 1, pp. 40–49, Mar. 2011, doi: 10.1002/mpr.329.

A. Khademi, “Flexible imputation of missing data 2nd edition,” J Stat Softw, vol. 93, pp. 1–4, 2020, doi: 10.18637/jss.v093.b01.

S. Pan and S. Chen, “Empirical Comparison of Imputation Methods for Multivariate Missing Data in Public Health,” Int J Environ Res Public Health, vol. 20, no. 2, Jan. 2023, doi: 10.3390/ijerph20021524.

T. Falahi, G. Nassreddine, and J. Younis, “Detecting Data Outliers with Machine Learning,” Al-Salam Journal for Engineering and Technology, pp. 152–164, May 2023, doi: 10.55145/ajest.2023.02.02.018.