Customer Loyalty Prediction for Hotel Industry Using Machine Learning Approach

Iskandar Zul Putera Hamdan - Universiti Tun Hussein Onn Malaysia, Batu Pahat, 86400, Malaysia
Muhaini Othman - Universiti Tun Hussein Onn Malaysia, Batu Pahat, 86400, Malaysia
Yana Mazwin Mohmad Hassim - Universiti Tun Hussein Onn Malaysia, Batu Pahat, 86400, Malaysia
Suziyanti Marjudi - Universiti Tun Hussein Onn Malaysia, Batu Pahat, 86400, Malaysia
Munirah Mohd Yusof - Universiti Tun Hussein Onn Malaysia, Batu Pahat, 86400, Malaysia

Citation Format:



Today, machine learning is utilized in several industries, including tourism, hospitality, and the hotel industry. This project uses machine learning approaches such as classification to predict hotel customers’ loyalty and develop viable strategies for managing and structuring customer relationships. The research is conducted using the CRISP-DM technique, and the three chosen classification algorithms are random forest, logistic regression, and decision tree. This study investigated key characteristics of merchants’ customers’ behavior, interest, and preference using a real-world case study with a hotel booking dataset from the C3 Rewards and C3 Merchant systems. Following a comprehensive investigation of prospective preferences in the pre-processing phase, the best machine learning algorithms are identified and assessed for forecasting customer loyalty in the hotel business. The study's outcome was recorded and examined further before hotel operators utilized it as a reference. The chosen algorithms are developed utilizing Python programming language, and the analysis result is evaluated using the Confusion Matrix, specifically in terms of precision, recall, and F1-score. At the end of the experiment, the accuracy values generated by the logistic regression, decision tree, and random forest algorithms were 57.83%, 71.44%, and 69.91%, respectively. To overcome the limits of this study method, additional datasets or upgraded algorithms might be utilized better to understand each algorithm's benefits and limitations and achieve further advancement. 


Machine learning; classification; CRISP-DM

Full Text:



B. Mahesh, “Machine Learning Algorithms - A Review,†Int. J. Sci. Res., vol. 9, no. 1, pp. 381–386, 2020, doi: 10.21275/ART20203995.

J. P. Simon, “Artificial intelligence: scope, players, markets and geography,†Digit. Policy, Regul. Gov., vol. 21, no. 3, pp. 208–237, 2019.

Y. Choi and J. W. Choi, “The prediction of hotel customer loyalty using machine learning technique,†Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 5, pp. 7908–7915, 2020, doi: 10.30534/ijatcse/2020/143952020.

W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-asl, and B. Yu, “methods , and applications,†pp. 1–11, 2018.

R. S. Ganga, P. C. P. Reddy, and B. C. Mohan, “System for Intelligent Tourist Information using Machine Learning Techniques Proof Logic Ontology vocabulary digital signatures,†Int. J. Appl. Eng. Res., vol. 13, no. 7, pp. 5321–5327, 2018.

M. O. Parvez, “Use of machine learning technology for tourist and organizational services : high-tech innovation in the hospitality industry,†J. Tour. Futur., vol. 7, no. 2, pp. 240–244, 2021, doi: 10.1108/JTF-09-2019-0083.

E. Mingotto, F. Montaguti, and M. Tamma, “Challenges in re-designing operations and jobs to embody AI and robotics in services. Findings from a case in the hospitality industry,†Electron. Mark., vol. 31, no. 3, pp. 493–510, 2021, doi: 10.1007/s12525-020-00439-y.

H. Ruel and E. Njoku, “AI redefining the hospitality industry,†J. Tour. Futur., vol. 7, no. 1, pp. 53–66, 2020, doi: 10.1108/JTF-03-2020-0032.

J. Wei et al., “Machine learning in materials science,†InfoMat, vol. 1, no. 3, pp. 338–358, 2019, doi: 10.1002/inf2.12028.

V. Nasteski, “An overview of the supervised machine learning methods,†Horizons.B, vol. 4, pp. 51–62, 2017, doi: 10.20544/horizons.b.04.1.17.p05.

E. Brynjolfsson and T. Mitchell, “What can machine learning do? Workforce implications,†Science (80-. )., vol. 358, no. 6370, pp. 1530–1534, 2017.

B. T. Jijo and A. M. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,†J. Appl. Sci. Technol. Trends, vol. 02, no. 01, pp. 20–28, 2021, doi: 10.38094/jastt20165.

H. H. Patel and P. Prajapati, “Study and Analysis of Decision Tree Based Classification Algorithms,†Citizen-Based Mar. Debris Collect. Train. Study case Pangandaran, vol. 6, no. 10, pp. 74–78, 2018.

L. Breiman, “Random Forests,†Mach. Learn., vol. 45, pp. 5–32, 2001.

C. M. Yeşilkanat, “Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm,†Chaos, Solitons and Fractals, vol. 140, 2020, doi: 10.1016/j.chaos.2020.110210.

I. Ahmad, M. Basheri, M. J. Iqbal, and A. Rahim, “Performance Comparison of Support Vector Machine , Random Forest , and Extreme Learning Machine for Intrusion Detection,†IEEE Access, vol. 6, pp. 33789–33795, 2018, doi: 10.1109/ACCESS.2018.2841987.

E. Nazarenko, V. Varkentin, and T. Polyakova, “Features of Application of Machine Learning Methods for Classification of Network Traffic ( Features , Advantages , Disadvantages ),†Int. Multi-Conference Ind. Eng. Mod. Technol., pp. 1–5, 2019, doi: 10.1109/FarEastCon.2019.8934236.

P. Ranganathan, C. Pramesh, and R. Aggarwal, “Common pitfalls in statistical analysis: Measures of agreement,†Perspect. Clin. Res., vol. 8, no. 3, pp. 148–151, 2017, doi: 10.4103/picr.PICR_123_17.

N. A. M. R. Senaviratna and T. M. J. A. Cooray, “Diagnosing Multicollinearity of Logistic Regression Model,†Asian J. Probab. Stat., vol. 5, no. 2, pp. 1–9, 2019, doi: 10.9734/ajpas/2019/v5i230132.

S. Uddin, A. Khan, E. Hossain, and M. A. Moni, “Comparing different supervised machine learning algorithms for disease prediction,†BMC Med. Inform. Decis. Mak., vol. 8, pp. 1–16, 2019.

H. Sulistiani, K. Muludi, and A. Syarif, “Implementation of Dynamic Mutual Information and Support Vector Machine for Customer Loyalty Classification,†J. Phys. Conf. Ser., vol. 1338, pp. 1–8, 2019, doi: 10.1088/1742-6596/1338/1/012050.

W. N. Wassouf, R. Alkhatib, K. Salloum, and S. Balloul, “Predictive analytics using big data for increased customer loyalty: Syriatel Telecom Company case study,†J. Big Data, vol. 7, no. 1, pp. 1–24, 2020, doi: 10.1186/s40537-020-00290-0.

R. Muttaqien, M. G. P, and A. Pramuntadi, “Implementation of Data Mining Using C4 . 5 Algorithm for Predicting Customer Loyalty of PT . Pegadaian ( Persero ) Pati Area Office,†Int. J. Comput. Inf. Syst., vol. 02, no. 03, pp. 64–68, 2021.

J. Han, M. Fang, S. Ye, C. Chen, Q. Wan, and X. Qian, “Using Decision Tree to Predict Response Rates of Consumer Satisfaction , Attitude , and Loyalty Surveys,†Sustainability, vol. 11, no. 2306, pp. 1–13, 2019.

R. Wirth and J. Hipp, “CRISP-DM: Towards a Standard Process Model for Data Mining,†Proc. Fourth Int. Conf. Pract. Appl. Knowl. Discov. Data Min., no. 24959, pp. 29–39, 2000.

W. Y. Ayele, “Adapting CRISP-DM for Idea Mining: A Data Mining Process for Generating Ideas using a Textual Dataset,†Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 6, pp. 20–32, 2020, doi: 10.14569/IJACSA.2020.0110603.

F. Martinez-Plumed et al., “CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories,†IEEE Trans. Knowl. Data Eng., vol. 33, no. 8, pp. 3048–3061, 2021, doi: 10.1109/TKDE.2019.2962680.

R. Ribeiro, A. Pilastri, C. Moura, F. Rodrigues, R. Rocha, and P. Cortez, “Predicting the Tear Strength of Woven Fabrics Via Automated Machine Learning: An Application of the CRISP-DM Methodology,†ICEIS 2020 - Proc. 22nd Int. Conf. Enterp. Inf. Syst., vol. 1, pp. 548–555, 2020, doi: 10.5220/0009411205480555.

Y. Watanabe et al., “Preliminary Systematic Literature Review of Machine Learning System Development Process,†IEEE 45th Annu. Comput. Software, Appl. Conf., pp. 1407–1408, 2019.

S. Studer et al., “Towards CRISP-ML ( Q ): A Machine Learning Process Model with Quality Assurance Methodology,†Mach. Learn. Knowl. Extr., vol. 3, no. 2, pp. 392–413, 2021.