How to Deeply Analyze the Content of Online Newspapers Using Clustering and Correlation

Yeni Rokhayati - Multimedia and Network Engineering, Politeknik Negeri Batam, Batam, 29641, Indonesia
- Sartikha - Informatics Engineering, Politeknik Negeri Batam, Batam, 29641, Indonesia
Nur Zahrati Janah - Informatics Engineering, Politeknik Negeri Batam, Batam, 29641, Indonesia

Citation Format:



The increase in the number of visitors is one of the keys to increasing income for online newspapers, whether to increase the number of ads, Google AdSense, or customer trust. Therefore, finding which news categories increase the number of visitors needs to be known and analyzed more deeply. Because it is very common to add content to online newspaper sites every day, even for hours, this pattern analysis is not the same as analyzing regular website content patterns. This study intends to add methods in the world of research on how to analyze website content, especially online news, by using the clustering method to classify what news categories bring high, medium, or a low number of visitors and then analyzing the correlation to explore the depth of the relationship between the variables, namely which parameters have a large or low effect on the increase in the number of visitors. A local Batam-based online newspaper company is used as a case study for this research. Data is collected, preprocessed first, and analyzed using the clustering and correlation method. This analysis of the news content readership suggests what news categories should be optimized because it provides an increase in the number of visitors. A summary of the analysis steps in this study is presented. We also provided some suggestions if other online newspaper owners or researchers are interested in a similar analysis of online news content.


Clustering; content analysis; correlation; data mining; news category; online newspapers.

Full Text:



H. Sjøvaag, “Introducing the paywall: A case study of content changes in three online newspapers,†Journal. Pract., vol. 10, no. 3, pp. 304–322, 2016.

R. K. Olsen and M. K. Solvoll, “Bouncing off the paywall--Understanding misalignments between local newspaper value propositions and audience responses,†Int. J. Media Manag., vol. 20, no. 3, pp. 174–192, 2018.

J. A. Penny, “What Would Google Do?,†Pers. Psychol., vol. 63, no. 3, p. 809, 2010.

R. Moro Visconti, “The Valuation of Newspaper Headings, Publishing Titles, and Copyright,†in The Valuation of Digital Intangibles, Springer, 2020, pp. 267–291.

D. Palau-Sampio, “Sponsored Content in Spanish Media: Strategies, Transparency, and Ethical Concerns,†Digit. Journal., vol. 9, no. 7, pp. 908–928, 2021.

C. J. Vargo and M. A. Amazeen, “Agenda-Cutting Versus Agenda-Building: Does Sponsored Content Influence Corporate News Coverage in US Media?,†Int. J. Commun., vol. 15, p. 22, 2021.

W.-Y. Lee, Y. Hur, D. Y. Kim, and C. Brigham, “The effect of endorsement and congruence on banner ads on sports websites,†Int. J. Sport. Mark. Spons., 2017.

H. Kindermann, “A short-term twofold impact on banner ads,†in International Conference on HCI in Business, Government, and Organizations, 2016, pp. 417–426.

R. Sen, “Optimal search engine marketing strategy,†Int. J. Electron. Commer., vol. 10, no. 1, pp. 9–25, 2005.

S. Das, Search engine optimization and marketing: A recipe for success in digital marketing. CRC Press, 2021.

T. B. Clarke, J. Murphy, L. R. Wetsch, and H. Boeck, “Teaching search engine marketing through the google ad grants program,†Mark. Educ. Rev., vol. 28, no. 2, pp. 136–147, 2018.

R. S. Bhandari and A. Bansal, “Impact of search engine optimization as a marketing tool,†Jindal J. Bus. Res., vol. 7, no. 1, pp. 23–36, 2018.

C. Jie, Z. W. Da Xu, L. Wang, and W. Shen, “Bidding via clustering ads intentions: an efficient search engine marketing system for e-commerce,†arXiv Prepr. arXiv2106.12700, 2021.

T. L. Tuten and M. R. Solomon, Social media marketing. Sage, 2017.

D. Evans, S. Bratton, and J. McKee, Social media marketing. AG Printing & Publishing, 2021.

R. Felix, P. A. Rauschnabel, and C. Hinsch, “Elements of strategic social media marketing: A holistic framework,†J. Bus. Res., vol. 70, pp. 118–126, 2017.

Y. K. Dwivedi et al., “Setting the future of digital and social media marketing research: Perspectives and research propositions,†Int. J. Inf. Manage., vol. 59, p. 102168, 2021.

W. Tafesse and A. Wien, “Implementing social media marketing strategically: an empirical assessment,†J. Mark. Manag., vol. 34, no. 9–10, pp. 732–749, 2018.

S. Sharma and M. Rai, “Customer Behaviour Analysis using Web Usage Mining,†Int J Sci Res Comput Sci Eng, vol. 5, no. 6, pp. 47–50, 2017.

M. Munk, A. Pilkova, L. Benko, P. Blazekova, and P. Svec, “Methodology of stakeholders’ behaviour modelling based on time,†MethodsX, vol. 8, p. 101570, 2021.

S. Sharma, M. Rai, and others, “Comparative analysis of various tools to predict consumer behaviour,†J. Comput. Theor. Nanosci., vol. 16, no. 9, pp. 3860–3866, 2019.

S. Mowla and N. P. Shetty, “Analysis of web server logs to understand internet user behaviour and develop digital marketing strategies,†Int. J. Eng. Technol., vol. 7, no. 4.41, pp. 15–21, 2018.

S. J. Miah, H. Q. Vu, J. Gammack, and M. McGrath, “A big data analytics method for tourist behaviour analysis,†Inf. & Manag., vol. 54, no. 6, pp. 771–785, 2017.

V. Belair-Gagnon and A. E. Holton, “Boundary Work, Interloper Media, And Analytics In Newsrooms: An analysis of the roles of web analytics companies in news production,†Digit. Journal., vol. 6, no. 4, pp. 492–508, 2018, doi: 10.1080/21670811.2018.1445001.

T. Y. Akhirina, A. Rusmardiana, D. Yulistyanti, F. G. Febrinanto, C. Dewi, and A. Triwiratno, “Popular Content Prediction Based on Web Visitor Data With Data Mining Approach Popular Content Prediction Based on Web Visitor Data With Data Mining Approach,†pp. 0–7, doi: 10.1088/1742-6596/1641/1/012105.

H. Sjøvaag and E. Stavelin, “Web media and the quantitative content analysis: Methodological challenges in measuring online news content,†Convergence, vol. 18, no. 2, pp. 215–229, 2012, doi: 10.1177/1354856511429641.

I. Benbasat, D. K. Goldstein, and M. Mead, “The Case Research Strategy in Studies of Information Systems,†MIS Q., vol. 11, no. 3, pp. 369–386, 1987, [Online]. Available:

K. M. Eisenhardt, “Building theories from case study research,†Acad. Manag. Rev., vol. 14, no. 4, pp. 532–550, 1989.

R. K. Yin, Case study research: Design and methods, vol. 5. sage, 2009.

B. Plaza, “Google Analytics for measuring website performance,†Tour. Manag., vol. 32, no. 3, pp. 477–481, 2011.

B. Clifton, Advanced web metrics with Google Analytics. John Wiley & Sons, 2012.

B. Mangold, Learning Google AdWords and Google Analytics. Loves Data, 2018.

S. Garc’ia, J. Luengo, and F. Herrera, Data preprocessing in data mining, vol. 72. Springer, 2015.

S.-A. N. Alexandropoulos, S. B. Kotsiantis, and M. N. Vrahatis, “Data preprocessing in predictive data mining,†Knowl. Eng. Rev., vol. 34, 2019.

S. A. Alasadi and W. S. Bhaya, “Review of data preprocessing techniques in data mining,†J. Eng. Appl. Sci., vol. 12, no. 16, pp. 4102–4107, 2017.

S. Garc’ia, J. Luengo, and F. Herrera, “Tutorial on practical tips of the most influential data preprocessing algorithms in data mining,†Knowledge-Based Syst., vol. 98, pp. 1–29, 2016.

Y. Rokhayati, U. H. B. Rusdi, D. E. Kurniawan, N. Z. Janah, and S. Irawan, “Analysis of SP students using AHP-Apriori combination,†in International Conference On Applied Science and Technology 2019-Social Sciences Track (iCASTSS 2019), 2019, pp. 214–219.

P. Mishra, A. Biancolillo, J. M. Roger, F. Marini, and D. N. Rutledge, “New data preprocessing trends based on ensemble of multiple preprocessing techniques,†TrAC - Trends Anal. Chem., vol. 132, p. 116045, 2020, doi: 10.1016/j.trac.2020.116045.

S. García, S. Ramírez-Gallego, J. Luengo, J. M. Benítez, and F. Herrera, “Big data preprocessing: methods and prospects,†Big Data Anal., vol. 1, no. 1, p. 9, 2016, doi: 10.1186/s41044-016-0014-0.

A. H. Bokhari, A. Y. Al-Dweik, F. D. Zaman, A. H. Kara, and F. M. Mahomed, “Generalization of the double reduction theory,†Nonlinear Anal. Real World Appl., vol. 11, no. 5, pp. 3763–3769, 2010, doi: 10.1016/j.nonrwa.2010.02.006.

M. D. Sikirić, A. Schürmann, and F. Vallentin, “A Generalization of Voronoi’s reduction theory and its application,†Duke Math. J., vol. 142, no. 1, pp. 127–164, 2008, doi: 10.1215/00127094-2008-003.

P. Berkhin, “Survey of Clustering Data Mining Techniques,†pp. 1–56.

J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,†33rd Int. Conf. Mach. Learn. ICML 2016, vol. 1, pp. 740–749, 2016.

K. P. Sinaga and M. S. Yang, “Unsupervised K-means clustering algorithm,†IEEE Access, vol. 8, pp. 80716–80727, 2020, doi: 10.1109/ACCESS.2020.2988796.

W. N. Arifin, “Introduction to R and RStudio IDE R and RStudio R packages Help,†2019.

S. Senthilnathan, “Usefulness of Correlation Analysis,†SSRN Electron. J., no. July, 2019, doi: 10.2139/ssrn.3416918.

D. Baus, “Correlation Analysis with Excel Handout,†2017.

N. J. Gogtay and U. M. Thatte, “Principles of correlation analysis,†J. Assoc. Physicians India, vol. 65, no. MARCH, pp. 78–81, 2017.