Leveraging Various Feature Selection Methods for Churn Prediction Using Various Machine Learning Algorithms

Kusnawi Kusnawi - Universitas Amikom Yogyakarta, Sleman, 55285, Indonesia
Joang Ipmawati - Universitas Nahdlatul Ulama Yogyakarta, Sleman, 55291, Indonesia
Bima Pramudya Asadulloh - Universitas Amikom Yogyakarta, Sleman, 55285, Indonesia
Afrig Aminuddin - Universitas Amikom Yogyakarta, Sleman, 55285, Indonesia
Ferian Fauzi Abdulloh - Universitas Amikom Yogyakarta, Sleman, 55285, Indonesia
Majid Rahardi - Universitas Amikom Yogyakarta, Sleman, 55285, Indonesia


Citation Format:



DOI: http://dx.doi.org/10.62527/joiv.8.2.2453

Abstract


This study aims to examine the effect of customer experience on customer retention at DQLab Telco, using machine learning techniques to predict customer churn. The study uses a dataset of 6590 customers of DQLab Telco, which contains various features related to their service usage and satisfaction. The data includes various features such as gender, tenure, phone service, internet service, monthly charges, and total charges. These features represent the demographic and service usage information of the customers. The study applies several feature selection methods, such as ANOVA, Recursive Feature Elimination, Feature Importance, and Pearson Correlation, to select the most relevant features for churn prediction. The study also compares three machine learning algorithms, namely Logistic Regression, Random Forest, and Gradient Boosting, to build and evaluate the prediction models. This study finds that Logistic Regression without feature selection achieves the highest accuracy of 79.47%, while Random Forest with Feature Importance and Gradient Boosting with Recursive Feature Elimination achieve accuracy of 77.60% and 79.86%, respectively. The study also identifies the features influencing customer churn most, such as monthly charges, tenure, partner, senior citizen, internet service, paperless billing, and TV streaming. The study provides valuable insights for DQLab Telco in developing customer churn reduction strategies based on predictive models and influential features. The study also suggests that feature selection and machine learning algorithms play a vital role in improving the accuracy of churn prediction and should be customized according to the data context.


Keywords


Machine Learning;Feature Selection;Customer Experience

Full Text:

PDF

References


K. Prasad, A. S. Tomar, T. De, and H. Soni, “A Conceptual Model for Building the Relationship Between Augmented Reality, Experiential Marketing & Brand Equity,” Int. J. Prof. Bus. Rev., vol. 7, no. 6, p. e01030, Dec. 2022, doi: 10.26668/businessreview/2022.v7i6.1030.

R. F. A. Aziza, A. Aminuddin, A. N. Widianingsih, and D. I. S. Saputra, “User Experience Analysis of Student Assistant Application Using The Five Planes Method,” 2023 7th Int. Conf. New Media Stud., 2023.

Z. Mustaffa, M. H. Sulaiman, D. Rohidin, F. Ernawan, and S. Kasim, “Time series predictive analysis based on hybridization of meta-heuristic algorithms,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 8, no. 5, pp. 1919–1925, 2018, doi: 10.18517/IJASEIT.8.5.4968.

E. Y. Sari, A. D. Wierfi, and A. Setyanto, “Sentiment Analysis of Customer Satisfaction on Transportation Network Company Using Naive Bayes Classifier,” 2019 Int. Conf. Comput. Eng. Network, Intell. Multimedia, CENIM 2019 - Proceeding, vol. 2019-November, Nov. 2019, doi: 10.1109/CENIM48368.2019.8973262.

B. P. Asaddulloh, A. Aminuddin, M. Rahardi, F. F. Abdulloh, A. Yaqin, and M. I. Hasani, “Machine Learning Techniques to Predict Rain Tomorrow for Automated Plant Watering System,” 2023 1st Int. Conf. Adv. Eng. Technol., 2023.

A. Yaqin, M. Rahardi, and F. F. Abdulloh, “Accuracy Enhancement of Prediction Method using SMOTE for Early Prediction Student’s Graduation in XYZ University,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 6, p. 2022, 2022, doi: 10.14569/IJACSA.2022.0130652.

N. Sholihah, F. F. Abdulloh, M. Rahardi, A. Aminuddin, B. P. Asaddulloh, and A. Y. A. Nugraha, “Feature Selection Optimization for Sentiment Analysis of Tax Policy Using SMOTE and PSO,” 2023 3rd Int. Conf. Smart Cities, Autom. Intell. Comput. Syst., 2023.

K. Chang and S. H. Park, “Random forest-based multi-faults classification modeling and analysis for intelligent centrifugal pump system,” J. Mech. Sci. Technol., Dec. 2023, doi: 10.1007/s12206-023-1202-2.

L. Pang, Z. Ding, H. Chai, and W. Shuang, “Construction and evaluation of a column chart model and a random forest model for predicting the prognosis of hydrodistention surgery in BPS/IC patients based on preoperative CD117, P2X3R, NGF, and TrkA levels,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, p. 287, Dec. 2023, doi: 10.1186/s12911-023-02396-w.

F. Chen et al., “A fault diagnosis method of rotating machinery based on improved multiscale attention entropy and random forests,” Nonlinear Dyn., Dec. 2023, doi: 10.1007/s11071-023-09126-x.

L. A. Al-Haddad, A. A. Jaber, M. N. Hamzah, and M. A. Fayad, “Vibration-current data fusion and gradient boosting classifier for enhanced stator fault diagnosis in three-phase permanent magnet synchronous motors,” Electr. Eng., Dec. 2023, doi: 10.1007/s00202-023-02148-z.

H. Nhat-Duc and T. Van-Duc, “Computer Vision-Based Severity Classification of Asphalt Pavement Raveling Using Advanced Gradient Boosting Machines and Lightweight Texture Descriptors,” Iran. J. Sci. Technol. Trans. Civ. Eng., vol. 47, no. 6, pp. 4059–4073, Dec. 2023, doi: 10.1007/s40996-023-01138-2.

F. Palmese et al., “Development and internal validation of a multivariable model for the prediction of the probability of 1-year readmission to the emergency department for acute alcohol intoxication,” Intern. Emerg. Med., Dec. 2023, doi: 10.1007/s11739-023-03490-7.

R. Kharsa and Z. Al Aghbari, “Leveraging Association Rules in Feature Selection for Deep Learning Classification,” SN Comput. Sci., vol. 5, no. 1, p. 112, Dec. 2023, doi: 10.1007/s42979-023-02397-6.

R. Mahto et al., “A novel and innovative cancer classification framework through a consecutive utilization of hybrid feature selection,” BMC Bioinformatics, vol. 24, no. 1, p. 479, Dec. 2023, doi: 10.1186/s12859-023-05605-5.

M. I. Akbar, A. Aminuddin, F. F. Abdulloh, M. Rahardi, S. N. Wahyuni, and B. P. Asaddulloh, “Comparison of Machine Learning Techniques for Heart Disease Diagnosis and Prediction,” 2023 Int. Conf. Adv. Mechatronics, Intell. Manuf. Ind. Autom., 2023.

M. M. Hamed, M. G. Khalafallah, and E. A. Hassanien, “Prediction of wastewater treatment plant performance using artificial neural networks,” Environ. Model. Softw., vol. 19, no. 10, pp. 919–928, Oct. 2004, doi: 10.1016/J.ENVSOFT.2003.10.005.

F. Wunderlich and D. Memmert, “A big data analysis of Twitter data during premier league matches: do tweets contain information valuable for in-play forecasting of goals in football?,” Soc. Netw. Anal. Min., vol. 12, no. 1, p. 23, Dec. 2022, doi: 10.1007/s13278-021-00842-z.

M. W. Ahmad, M. Mourshed, and Y. Rezgui, “Tree-based ensemble methods for predicting PV power generation and their comparison with support vector regression,” Energy, vol. 164, pp. 465–474, Dec. 2018, doi: 10.1016/J.ENERGY.2018.08.207.

K. Maswadi, N. A. Ghani, S. Hamid, and M. B. Rasheed, “Human activity classification using Decision Tree and Naïve Bayes classifiers,” Multimed. Tools Appl., vol. 80, no. 14, pp. 21709–21726, Jun. 2021, doi: 10.1007/S11042-020-10447-X/TABLES/3.

H. Zakiyyah and S. Suyanto, “Prediction of Covid-19 Infection in Indonesia Using Machine Learning Methods,” J. Phys. Conf. Ser., vol. 1844, no. 1, pp. 1–6, 2021, doi: 10.1088/1742-6596/1844/1/012002.

R. Nair and A. Bhagat, “Feature Selection Method To Improve The Accuracy of Classification Algorithm,” Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 6, pp. 124–127, 2019.

A. M. Rahat, A. Kahir, and A. K. M. Masum, “Comparison of Naive Bayes and SVM Algorithm based on Sentiment Analysis Using Review Dataset,” Proc. 2019 8th Int. Conf. Syst. Model. Adv. Res. Trends, SMART 2019, pp. 266–270, Feb. 2020, doi: 10.1109/SMART46866.2019.9117512.

K. B. Newhart, R. W. Holloway, A. S. Hering, and T. Y. Cath, “Data-driven performance analyses of wastewater treatment plants: A review,” Water Res., vol. 157, pp. 498–513, Jun. 2019, doi: 10.1016/J.WATRES.2019.03.030.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res., vol. 16, no. 2, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.

N. M. Nawi, W. H. Atomi, and M. Z. Rehman, “The Effect of Data Pre-processing on Optimized Training of Artificial Neural Networks,” Procedia Technol., vol. 11, pp. 32–39, Jan. 2013, doi: 10.1016/J.PROTCY.2013.12.159.

J. Singh and P. Tripathi, “Sentiment analysis of Twitter data by making use of SVM, Random Forest and Decision Tree algorithm,” Proc. - 2021 IEEE 10th Int. Conf. Commun. Syst. Netw. Technol. CSNT 2021, pp. 193–198, 2021, doi: 10.1109/CSNT51715.2021.9509679.

D. A. Anggoro and D. Permatasari, “Performance Comparison of the Kernels of Support Vector Machine Algorithm for Diabetes Mellitus Classification,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 1, pp. 580–585, 2023, doi: 10.14569/IJACSA.2023.0140163.

B. Baranidharan, A. Pal, and P. Muruganandam, “Cardiovascular disease prediction based on ensemble technique enhanced using extra tree classifier for feature selection,” Int. J. Recent Technol. Eng., vol. 8, no. 3, pp. 3236–3242, 2019, doi: 10.35940/ijrte.C5404.098319.

Y. B. P. Pamukti and M. Rahardi, “Sentiment Analysis of Bandung Tourist Destination Using Support Vector Machine and Naïve Bayes Algorithm,” Proceeding - 6th Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng. Appl. Data Sci. Artif. Intell. Technol. Environ. Sustain. ICITISEE 2022, pp. 391–395, 2022, doi: 10.1109/ICITISEE57756.2022.10057802.