Classification of Predicting Customer Ad Clicks Using Logistic Regression and k-Nearest Neighbors

Yasi Dani - Bina Nusantara University

Citation Format:



In the past, marketing techniques were done conventionally (non-digital), moreover in recent years conventional marketing techniques are starting to turn into online (digital) marketing techniques where this marketing requires internet access. Since online marketing techniques have many advantages, especially in terms of cost efficiency and also sending information to the public more quickly and widely. Therefore, many companies are interested in online marketing and they advertise on social media platforms and websites. However, one of the challenges for companies or businesses in online marketing is to determine the right target or consumers who are not interested in buying their products, the advertising costs will be high. One use of online advertising is clicks on ads which is a marketing measurement of how many users click on the online ad. Thus, companies need a click prediction system so that they know the right target consumers.  Nowadays, different types of advertisers and search engines rely on modeling to predict ad clicks accurately. This paper constructs the advertisement customer ad clicks prediction model using the machine learning approach, since machine learning systems have become more sophisticated in their ability to effectively predict the probability of a click. We proposed two types of classification algorithms namely logistic regression (LR) that produce probabilistic outputs and k-nearest neighbors (k-NN) classifier that produce non-probabilistic outputs.  Furthermore, this study compares the two classification algorithms and determines the best algorithm based on their performance, we calculate confusion matrix and several metrics that are precision, recall, accuracy, F1-score, and AUC-ROC. The higher the metric values, the better the classification algorithm for predictive analysis of users clicking on ads where the data set comes from an advertising dataset from a marketing agency. The purpose of this research is to help companies or businesses use the right method for predictive analysis to reach the right target consumers


machine learning algorithm; logistic regression; k-nearest neighbors; supervised classification; ad-click


YANG, Yanwu; ZHAI, Panyu. Click-through rate prediction in online advertising: A literature review. Information Processing & Management, 2022, 59. Jg., Nr. 2, S. 102853.

GARETT, Renee R., et al. An online advertising intervention to increase adherence to stay-at-home-orders during the COVID-19 pandemic: An efficacy trial monitoring individual-level mobility data. International Journal of Applied Earth Observation and Geoinformation, 2022, 108. Jg., S. 102752.

SHRIVASTAVA, Geetanjali; NAGAR, Vaishnavi; GILL, Simranjeet Kaur. The Effects of Advertising on Consumer Buying Behavior with Special Reference to FMCG Industry. AU-HIU International Multidisciplinary Journal, 2022, 2. Jg., S. 1-8.

GU, Shengyu, et al. Impact of the covid-19 pandemic on online consumer purchasing behavior. Journal of Theoretical and Applied Electronic Commerce Research, 2021, 16. Jg., Nr. 6, S. 2263-2281.

XIONG, Lei; YAO, Ye. Study on an adaptive thermal comfort model with K-nearest-neighbors (KNN) algorithm. Building and Environment, 2021, 202. Jg., S. 108026.

CHICCO, Davide; TÖTSCH, Niklas; JURMAN, Giuseppe. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData mining, 2021, 14. Jg., Nr. 1, S. 1-22.

ITOO, Fayaz, et al. Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. International Journal of Information Technology, 2021, 13. Jg., Nr. 4, S. 1503-1511.

ROMADHON, Manzilur Rahman; KURNIAWAN, Fachrul. A Comparison of Naive Bayes Methods, Logistic Regression and KNN for Predicting Healing of Covid-19 Patients in Indonesia. In: 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT). IEEE, 2021. S. 41-44.

VARNALI, Kaan. Online behavioral advertising: An integrative review. Journal of Marketing Communications, 2021, 27. Jg., Nr. 1, S. 93-114.

FOWLER, Erika Franklin, et al. Political advertising online and offline. American Political Science Review, 2021, 115. Jg., Nr. 1, S. 130-149.

MINARNO, Agus Eko; KUSUMA, Wahyu Andhyka; WIBOWO, Hardianto. Performance Comparisson Activity Recognition using Logistic Regression and Support Vector Machine. In: 2020 3rd International Conference on Intelligent Autonomous Systems (ICoIAS). IEEE, 2020. S. 19-24.

MURTIRAWAT, Ram, et al. Breast Cancer detection using K-nearest neighbors, logistic regression and ensemble learning. In: 2020 international conference on electronics and sustainable communication systems (ICESC). IEEE, 2020. S. 534-540.

SHAH, Kanish, et al. A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augmented Human Research, 2020, 5. Jg., Nr. 1, S. 1-16.

ALI, Munwar, et al. Semantic-k-NN algorithm: An enhanced version of traditional k-NN algorithm. Expert Systems with Applications, 2020, 151. Jg., S. 113374.

KONONOVA, Anastasia, et al. Click, click, ad: the proportion of relevant (vs. irrelevant) ads matters when advertising within paginated online content. International Journal of Advertising, 2020, 39. Jg., Nr. 7, S. 1031-1058.

SHERWOOD, Sebastian M.; SMITH, Tiaki B.; MASTERS, Rich SW. Decision reinvestment, pattern recall and decision making in rugby union. Psychology of Sport and Exercise, 2019, 43. Jg., S. 226-232.

SABERIOON, Mohammadmehdi, et al. Comparative performance analysis of support vector machine, random forest, logistic regression and k-nearest neighbours in rainbow trout (Oncorhynchus mykiss) classification using image-based features. Sensors, 2018, 18. Jg., Nr. 4, S. 1027.

FISCHER, Mary E., et al. An epidemiologic study of the association between free recall dichotic digits test performance and vascular health. Journal of the American Academy of Audiology, 2019, 30. Jg., Nr. 04, S. 282-292.

SANDAG, Green Arther; TEDRY, Natalia Elisabet; LOLONG, Steven. Classification of lower back pain using K-Nearest Neighbor algorithm. In: 2018 6th International Conference on Cyber and IT Service Management (CITSM). IEEE, 2018. S. 1-5.

FAROOQI, Md Rashid; AHMAD, Md Faiz. The Effectiveness of Online Advertising on Consumers’ Mind–An Empirical Study. International Journal of Engineering & Technology, 2018, 7. Jg., Nr. 2.11, S. 48-51.

AMLATHE, Prakhar. Standard machine learning techniques in audio beehive monitoring: Classification of audio samples with logistic regression, K-nearest neighbor, random forest and support vector machine. 2018. Doktorarbeit. Utah State University.

MISHRA, Ashamayee; MAHALIK, D. Impact of online-advertising on consumers. International Journal of Advanced Research, 2017, 5. Jg., Nr. 6, S. 1935-1939.

HUANG, Kunshan, et al. Spectral–spatial hyperspectral image classification based on KNN. Sensing and Imaging, 2016, 17. Jg., Nr. 1, S. 1-13.

TOLLES, Juliana; MEURER, William J. Logistic regression: relating patient characteristics to outcomes. Jama, 2016, 316. Jg., Nr. 5, S. 533-534.

RAHMAN, Hezlin Aryani Abd, et al. Comparisons of ADABOOST, KNN, SVM and logistic regression in classification of imbalanced dataset. In: International Conference on Soft Computing in Data Science. Springer, Singapore, 2015. S. 54-64.

BLEIER, Alexander; EISENBEISS, Maik. Personalized online advertising effectiveness: The interplay of what, when, and where. Marketing Science, 2015, 34. Jg., Nr. 5, S. 669-688.

HE, Xinran, et al. Practical lessons from predicting clicks on ads at facebook. In: Proceedings of the eighth international workshop on data mining for online advertising. 2014. S. 1-9.

GOLDFARB, Avi. What is different about online advertising?. Review of Industrial Organization, 2014, 44. Jg., Nr. 2, S. 115-129.

CAMPILLO-GIMENEZ, Boris, et al. Improving case-based reasoning systems by combining K-nearest neighbour algorithm with logistic regression in the prediction of patients’ registration on the renal transplant waiting list. PLoS One, 2013, 8. Jg., Nr. 9, S. e71991.

MORRIS, Ian. The measure of civilization. In: The Measure of Civilization. Princeton University Press, 2013.

SUBHA, M. V.; NAMBI, S. Thirupparkadal. Classification of Stock Index movement using k-Nearest Neighbours (k-NN) algorithm. WSEAS Transactions on Information Science & Applications, 2012, 9. Jg., Nr. 9, S. 261-270.

GOLDFARB, Avi; TUCKER, Catherine E. Privacy regulation and online advertising. Management science, 2011, 57. Jg., Nr. 1, S. 57-71.

BRAJNIK, Giorgio; GABRIELLI, Silvia. A review of online advertising effects on the user experience. International Journal of Human-Computer Interaction, 2010, 26. Jg., Nr. 10, S. 971-997.

GUHA, Saikat; CHENG, Bin; FRANCIS, Paul. Challenges in measuring online advertising systems. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement. 2010. S. 81-87.

CHENG, Weiwei; HÜLLERMEIER, Eyke. Combining instance-based learning and logistic regression for multilabel classification. Machine Learning, 2009, 76. Jg., Nr. 2, S. 211-225.

HA, Louisa. Online advertising research in advertising journals: A review. Journal of Current Issues & Research in Advertising, 2008, 30. Jg., Nr. 1, S. 31-48.

LAVALLEY, Michael P. Logistic regression. Circulation, 2008, 117. Jg., Nr. 18, S. 2395-2399.

UYAR, Asli; GURGEN, Fikret. Arrhythmia classification using serial fusion of support vector machines and logistic regression. In: 2007 4th IEEE Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications. IEEE, 2007. S. 560-565.

RAPPAPORT, Stephen D. Lessons from online practice: new advertising models. Journal of Advertising Research, 2007, 47. Jg., Nr. 2, S. 135-141.

MCCOY, Scott, et al. The effects of online advertising. Communications of the ACM, 2007, 50. Jg., Nr. 3, S. 84-88.

SHANG, Wenqian, et al. An improved kNN algorithm–fuzzy kNN. In: International Conference on Computational and Information Science. Springer, Berlin, Heidelberg, 2005. S. 741-746.

DREISEITL, Stephan; OHNO-MACHADO, Lucila. Logistic regression and artificial neural network classification models: a methodology review. Journal of biomedical informatics, 2002, 35. Jg., Nr. 5-6, S. 352-359.

VERLINDE, Patrick; CHOLET, G. Comparing decision fusion paradigms using k-NN based classifiers, decision trees and logistic regression in a multi-modal identity verification application. In: Proc. Int. Conf. Audio and Video-Based Biometric Person Authentication (AVBPA). 1999. S. 188-193.

ZOUHAL, Lalla Meriem; DENOEUX, Thierry. An evidence-theoretic k-NN rule with parameter optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 1998, 28. Jg., Nr. 2, S. 263-271.

DEMARIS, Alfred. A tutorial in logistic regression. Journal of Marriage and the Family, 1995, S. 956-968.

ENAS, Gregory G.; CHOI, Sung C. Choice of the smoothing parameter and efficiency of k-nearest neighbor classification. In: Statistical Methods of Discrimination and Classification. Pergamon, 1986. S. 235-244.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

JOIV : International Journal on Informatics Visualization
ISSN 2549-9610  (print) | 2549-9904 (online)
Organized by Department of Information Technology - Politeknik Negeri Padang, and Institute of Visual Informatics - UKM and Soft Computing and Data Mining Centre - UTHM
W :
E :,,

View JOIV Stats

Creative Commons License is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.