Classification of Predicting Customer Ad Clicks Using Logistic Regression and k-Nearest Neighbors

Yasi Dani - Bina Nusantara University, Jakarta, Indonesia
Maria Ginting - Bina Nusantara University, Jakarta, Indonesia

Citation Format:



Nowadays, conventional marketing techniques have changed to online (digital) marketing techniques requiring internet access. Online marketing techniques have many advantages, especially in terms of cost efficiency and fast information delivery to the public. Therefore, many companies are interested in online marketing and advertising on social media platforms and websites. However, one of the challenges for companies in online marketing is determining the right target consumers since if they target consumers who are not interested in buying the product, the advertising costs will be high. One use of online advertising is clicks on ads which is a marketing measurement of how many users click on the online ad. Thus, companies need a click prediction system to know the right target consumers. And different types of advertisers and search engines rely on modeling to predict ad clicks accurately. This paper constructs the customer ad clicks prediction model using the machine learning approach that becomes more sophisticated in effectively predicting the probability of a click. We propose two classification algorithms: the logistic regression (LR) classifier, which produces probabilistic outputs, and the k-nearest neighbors (k-NN) classifier, which produces non-probabilistic outputs. Furthermore, this study compares the two classification algorithms and determines the best algorithm based on their performance. We calculate the confusion matrix and several metrics: precision, recall, accuracy, F1-score, and AUC-ROC. The experiments show that the logistic regression algorithm performs best on a given dataset.


machine learning algorithm; logistic regression; k-nearest neighbors; supervised classification; ad-click

Full Text:



G. Shrivastava, V. Nagar, and S. K. Gill, “The Effects of Advertising on Consumer Buying Behavior with Special Reference to FMCG Industry,†AU-HIU Int. Multidiscip. J., vol. 2, no. 1, pp. 1–8, 2022.

A. Goldfarb, “What is Different About Online Advertising?,†Rev. Ind. Organ., vol. 44, no. 2, 2014, doi: 10.1007/s11151-013-9399-3.

R. R. Garett, J. Yang, Q. Zhang, and S. D. Young, “An online advertising intervention to increase adherence to stay-at-home-orders during the COVID-19 pandemic: An efficacy trial monitoring individual-level mobility data,†Int. J. Appl. Earth Obs. Geoinf., vol. 108, 2022, doi: 10.1016/j.jag.2022.102752.

S. Gu, B. Ślusarczyk, S. Hajizada, I. Kovalyova, and A. Sakhbieva, “Impact of the covid-19 pandemic on online consumer purchasing behavior,†J. Theor. Appl. Electron. Commer. Res., vol. 16, no. 6, 2021, doi: 10.3390/jtaer16060125.

K. Varnali, “Online behavioral advertising: An integrative review,†Journal of Marketing Communications, vol. 27, no. 1. 2021. doi: 10.1080/13527266.2019.1630664.

E. F. Fowler, M. M. Franz, G. J. Martin, Z. Peskowitz, and T. N. Ridout, “Political Advertising Online and Offline,†Am. Polit. Sci. Rev., vol. 115, no. 1, 2021, doi: 10.1017/S0003055420000696.

G. Brajnik and S. Gabrielli, “A review of online advertising effects on the user experience,†Int. J. Hum. Comput. Interact., vol. 26, no. 10, 2010, doi: 10.1080/10447318.2010.502100.

M. R. Farooqi and M. F. Ahmad, “The effectiveness of online advertising on consumers’ mind - An empirical study,†Int. J. Eng. Technol., vol. 7, no. 2, 2018, doi: 10.14419/ijet.v7i2.11.11006.

S. Guha, B. Cheng, and P. Francis, “Challenges in measuring online advertising systems,†2010. doi: 10.1145/1879141.1879152.

Y. Yang and P. Zhai, “Click-through rate prediction in online advertising: A literature review,†Inf. Process. Manag., vol. 59, no. 2, 2022, doi: 10.1016/j.ipm.2021.102853.

P. Amlathe, “Standard Machine Learning Techniques in Audio Beehive Monitoring: Classification of Audio Samples with Logistic Regression, K-Nearest Neighbor, Random Forest and Support Vector Machine,†ProQuest Diss. Theses, 2018.

F. Itoo, Meenakshi, and S. Singh, “Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection,†Int. J. Inf. Technol., vol. 13, no. 4, 2021, doi: 10.1007/s41870-020-00430-y.

A. Kononova, W. Kim, E. Joo, and K. Lynch, “Click, click, ad: the proportion of relevant (vs. irrelevant) ads matters when advertising within paginated online content,†Int. J. Advert., vol. 39, no. 7, 2020, doi: 10.1080/02650487.2020.1732114.

M. Richardson, E. Dominowska, and R. Ragno, “Predicting clicks: Estimating the click-through rate for new ads,†2007. doi: 10.1145/1242572.1242643.

H. Cheng and E. Cantú-Paz, “Personalized click prediction in sponsored search,†2010. doi: 10.1145/1718487.1718531.

A. Broder et al., “To swing or not to swing: Learning when (not) to advertise,†2008. doi: 10.1145/1458082.1458216.

Q. Guo and E. Agichtein, “Ready to buy or just browsing? Detecting web searcher goals from interaction data,†2010. doi: 10.1145/1835449.1835473.

D. Chakrabarti, D. Agarwal, and V. Josifovski, “Contextual advertising by combining relevance with click feedback,†2008. doi: 10.1145/1367497.1367554.

A. DeMaris, “A Tutorial in Logistic Regression,†J. Marriage Fam., vol. 57, no. 4, 1995, doi: 10.2307/353415.

S. Dreiseitl and L. Ohno-Machado, “Logistic regression and artificial neural network classification models: A methodology review,†J. Biomed. Inform., vol. 35, no. 5–6, 2002, doi: 10.1016/S1532-0464(03)00034-0.

M. R. Romadhon and F. Kurniawan, “A Comparison of Naive Bayes Methods, Logistic Regression and KNN for Predicting Healing of Covid-19 Patients in Indonesia,†2021. doi: 10.1109/EIConCIT50028.2021.9431845.

A. E. Minarno, W. A. Kusuma, and H. Wibowo, “Performance Comparisson Activity Recognition using Logistic Regression and Support Vector Machine,†2020. doi: 10.1109/ICoIAS49312.2020.9081858.

M. P. LaValley, “Logistic regression,†Circulation, vol. 117, no. 18, pp. 2395–2399, 2008, doi: 10.1161/CIRCULATIONAHA.106.682658.

W. Cheng and E. Hüllermeier, “Combining instance-based learning and logistic regression for multilabel classification,†in Machine Learning, 2009, vol. 76, no. 2–3. doi: 10.1007/s10994-009-5127-5.

J. Tolles and W. J. Meurer, “Logistic regression: Relating patient characteristics to outcomes,†JAMA - Journal of the American Medical Association, vol. 316, no. 5. 2016. doi: 10.1001/jama.2016.7653.

R. Murtirawat, S. Panchal, V. K. Singh, and Y. Panchal, “Breast Cancer Detection Using K-Nearest Neighbors, Logistic Regression and Ensemble Learning,†2020. doi: 10.1109/ICESC48915.2020.9155783.

K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification,†Augment. Hum. Res., vol. 5, no. 1, 2020, doi: 10.1007/s41133-020-00032-0.

H. A. A. Rahman, Y. B. Wah, H. He, and A. Bulgiba, “Comparisons of ADABOOST, KNN, SVM and logistic regression in classification of imbalanced dataset,†in Communications in Computer and Information Science, 2015, vol. 545. doi: 10.1007/978-981-287-936-3_6.

A. Uyar and F. Gürgen, “Arrhythmia classification using serial fusion of support vector machines and logistic regression,†2007. doi: 10.1109/IDAACS.2007.4488483.

L. Xiong and Y. Yao, “Study on an adaptive thermal comfort model with K-nearest-neighbors (KNN) algorithm,†Build. Environ., vol. 202, 2021, doi: 10.1016/j.buildenv.2021.108026.

P. Verlinde and G. Cholet, “Comparing Decision Fusion Paradigms Using k-NN based Classifiers, Decision Trees and Logistic Regression in A Multi-modal Identity Verification Application,†4th Int. Conf. Audio- Video-based Biometric Pers. Authentication, 1999.

G. A. Sandag, N. E. Tedry, and S. Lolong, “Classification of Lower Back Pain Using K-Nearest Neighbor Algorithm,†2019. doi: 10.1109/CITSM.2018.8674361.

L. M. Zouhal and T. Denoeux, “An evidence-theoretic k-NN rule with parameter optimization,†IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., vol. 28, no. 2, 1998, doi: 10.1109/5326.669565.

M. Ali, L. T. Jung, A. H. Abdel-Aty, M. Y. Abubakar, M. Elhoseny, and I. Ali, “Semantic-k-NN algorithm: An enhanced version of traditional k-NN algorithm,†Expert Syst. Appl., vol. 151, 2020, doi: 10.1016/j.eswa.2020.113374.

K. Huang, S. Li, X. Kang, and L. Fang, “Spectral–Spatial Hyperspectral Image Classification Based on KNN,†Sens. Imaging, vol. 17, no. 1, 2016, doi: 10.1007/s11220-015-0126-z.

B. Campillo-Gimenez, W. Jouini, S. Bayat, and M. Cuggia, “Improving Case-Based Reasoning Systems by Combining K-Nearest Neighbour Algorithm with Logistic Regression in the Prediction of Patients’ Registration on the Renal Transplant Waiting List,†PLoS One, vol. 8, no. 9, 2013, doi: 10.1371/journal.pone.0071991.

W. Shang, H. Huang, H. Zhu, Y. Lin, Z. Wang, and Y. Qu, “An improved kNN algorithm - Fuzzy kNN,†in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2005, vol. 3801 LNAI. doi: 10.1007/11596448_109.

G. G. Enas and S. C. Choi, “Choice of the smoothing parameter and efficiency of k-nearest neighbor classification,†Comput. Math. with Appl., vol. 12, no. 2 PART A, 1986, doi: 10.1016/0898-1221(86)90076-3.

M. V. Subha and S. T. Nambi, “Classification of stock index movement using k-nearest neighbours (k-NN) algorithm,†WSEAS Trans. Inf. Sci. Appl., vol. 9, no. 9, 2012.

M. Saberioon, P. CísaÅ™, L. Labbé, P. SouÄek, P. Pelissier, and T. Kerneis, “Comparative performance analysis of support vector machine, random forest, logistic regression and k-nearest neighbours in rainbow trout (oncorhynchus mykiss) classification using image-based features,†Sensors (Switzerland), vol. 18, no. 4, 2018, doi: 10.3390/s18041027.

M. E. Fischer et al., “An epidemiologic study of the association between free recall dichotic digits test performance and vascular health,†J. Am. Acad. Audiol., vol. 30, no. 4, 2019, doi: 10.3766/jaaa.17079.

D. Chicco, N. Tötsch, and G. Jurman, “The matthews correlation coefficient (Mcc) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation,†BioData Min., vol. 14, 2021, doi: 10.1186/s13040-021-00244-z.

S. M. Sherwood, T. B. Smith, and R. S. W. Masters, “Decision reinvestment, pattern recall and decision making in rugby union,†Psychol. Sport Exerc., vol. 43, 2019, doi: 10.1016/j.psychsport.2019.03.002.