Text Mining for News Forecasting on The Turnback Hoax Website

Rio Wirawan - Universitas Pembangunan Nasional Veteran Jakarta, Jakarta, Indonesia
Erly Krisnanik - Universitas Pembangunan Nasional Veteran Jakarta, Jakarta, Indonesia
Artika Arista - Universiti Malaya, Kuala Lumpur, Malaysia

Citation Format:

DOI: http://dx.doi.org/10.62527/joiv.8.1.1939


News has been disseminated swiftly via the internet due to the rapid growth of information technology. The rapid spreading of news often confuses because the truth cannot be ascertained. Additionally, online social media is becoming increasingly popular, making it an excellent environment for propagating false information, including misinformation, phony reviews, advertising, rumors, political remarks, innuendo, etc. This study's specific goal is to classify data using a data mining approach model called text mining so that a system can automatically do the classification. As a result, the study will produce a dataset, which can then be used to create an application using data mining's ability to predict breaking news. An application was produced by employing data mining to forecast recent news. This study was able to classify data using a naive Bayes data mining approach model so that a system can automatically do the classification. The study produced an accuracy of 77% obtained with training data of 82%. From 994 contents, the classification of misleading content reached 33.9%, false content as many as 24.85%, imitation content was 13.48%, fake content reached 11.07%, manipulated content was 9.86%, parody content was 3.22%, satire content was 2.31%, and connection content as many as 1.31%. This study then visualizes the results using bar charts and word clouds. This work also produced datasets with the naïve Bayes method of news data and news that has been valid. Afterward, the dataset will be used in making applications to produce prototypes of computer program applications.


News; text mining; turnback hoax website; dataset; naïve bayes

Full Text:



A. Arista and B. S. Abbas, "Using the UTAUT2 model to explain teacher acceptance of work performance assessment system," International Journal of Evaluation and Research in Education (IJERE), vol. 11, no. 4, pp. 2200–2208, 2022, doi: 10.11591/ijere.v11i4.22561.

A. Arista, “Comparison Decision Tree and Logistic Regression Machine Learning Classification Algorithms to determine Covid-19,” Sinkron: jurnal dan penelitian teknik informatika, vol. 7, no. 1, pp. 59–65, Jan. 2022, doi: 10.33395/sinkron.v7i1.11243.

- Tjahjanto, A. Arista, and - Ermatita, “Application of the Waterfall Method in Information System for State-owned inventories Management Development,” Sinkron: jurnal dan penelitian teknik informatika, vol. 7, no. 4, pp. 2182–2192, 2022, doi: 10.33395/sinkron.v7i4.11678.

T. Theresiawati, H. B. Seta, and A. Arista, "Implementing quality function deployment using service quality and Kano model to the quality of e-learning," International Journal of Evaluation and Research in Education (IJERE), vol. 12, no. 3, pp. 1560–1571, Sep. 2023, doi: 10.11591/ijere.v12i3.25511.

U. Rusdiana, I. Ernawati, N. Falih, and A. Arista, "Comparison of Distance Metrics on Fuzzy C-Means Algorithm Through Customer Segmentation," in 2021 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), 2021, pp. 307–311.

W. Cholil, F. Panjaitan, F. Ferdiansyah, A. Arista, R. Astriratma, and T. Rahayu, "Comparison of Machine Learning Methods in Sentiment Analysis PeduliLindungi Applications," in 2022 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), IEEE, 2022, pp. 276–280.

A. Arista and K. N. M. Ngafidin, "An Information System Risk Management of a Higher Education Computing Environment," International Journal on Advanced Science, Engineering and Information Technology (IJASEIT), vol. 12, no. 2, pp. 557–564, 2022, doi: 10.18517/ijaseit.12.2.13953.

Y. Wang, M. McKee, A. Torbica, and D. Stuckler, "Systematic Literature Review on the Spread of Health-related Misinformation on Social Media," Social Science and Medicine, vol. 240. Elsevier Ltd, Nov. 01, 2019. doi: 10.1016/j.socscimed.2019.112552.

S. Kumar, A. Mallik, A. Khetarpal, and B. S. Panda, "Influence maximization in social networks using graph embedding and graph neural network," Information Sciences , vol. 607, pp. 1617–1636, Aug. 2022, doi: 10.1016/j.ins.2022.06.075.

X. Zhang and A. A. Ghorbani, "An overview of online fake news: Characterization, detection, and discussion," Information Processing and Management , vol. 57, no. 2, Mar. 2020, doi: 10.1016/j.ipm.2019.03.004.

- Tim APJII, “Buletin APJII,” APJII 91 Edition, July 2021, Jul. 2021. Accessed: May 07, 2023. [Online]. Available: https://apjii.or.id/assets/media/buletin_apjii_edisi_91_-_juli_2021_bulletin.pdf

J. Canavilhas and T. de M. Jorge, "Fake News Explosion in Portugal and Brazil the Pandemic and Journalists' Testimonies on Disinformation," Journalism and Media, vol. 3, no. 1, pp. 52–65, Jan. 2022, doi: 10.3390/journalmedia3010005.

M. D. Molina, S. S. Sundar, T. Le, and D. Lee, "'Fake News' Is Not Simply False Information: A Concept Explication and Taxonomy of Online Content," American Behavioral Scientist, vol. 65, no. 2, pp. 180–212, Feb. 2021, doi: 10.1177/0002764219878224.

I. Ali, M. N. Bin Ayub, P. Shivakumara, and N. F. B. M. Noor, "Fake News Detection Techniques on Social Media: A Survey," Wireless Communications and Mobile Computing , vol. 2022, pp. 1–17, Aug. 2022, doi: 10.1155/2022/6072084.

P. N. Vasist and S. Krishnan, "Demystifying fake news in the hospitality industry: A systematic literature review, framework, and an agenda for future research," International Journal of Hospitality Management , vol. 106, Sep. 2022, doi: 10.1016/j.ijhm.2022.103277.

S. R. Sahoo and B. B. Gupta, "Multiple features based approach for automatic fake news detection on social networks using deep learning," Applied Soft Computing , vol. 100, Mar. 2021, doi: 10.1016/j.asoc.2020.106983.

D. Ehrenfeld and M. Barton, "Online Public Spheres in the Era of Fake News: Implications for the Composition Classroom," Computers and Composition , vol. 54, Dec. 2019, doi: 10.1016/j.compcom.2019.102525.

L. Soetekouw and S. Angelopoulos, "Digital Resilience Through Training Protocols: Learning To Identify Fake News On Social Media," Information Systems Frontiers, 2022, doi: 10.1007/s10796-021-10240-7.

A. Herasimenka, J. Bright, A. Knuutila, and P. N. Howard, "Misinformation and professional news on largely unmoderated platforms: the case of telegram," Journal of Information Technology and Politics, vol. 20, no. 2, pp. 198–212, 2023, doi: 10.1080/19331681.2022.2076272.

H. Mustofa and A. A. Mahfudh, "Classification of Hoax News Using the Naive Bayes Method," Walisongo Journal of Information Technology, vol. 1, no. 1, p. 1, Nov. 2019, doi: 10.21580/wjit.2019.1.1.3915.

J. Lee, K. Kim, G. Park, and N. Cha, "The role of online news and social media in preventive action in times of infodemic from a social capital perspective: The case of the COVID-19 pandemic in South Korea," Telematics and Informatics, vol. 64, Nov. 2021, doi: 10.1016/j.tele.2021.101691.

E. Park, J. Park, and M. Hu, "Tourism demand forecasting with online news data mining," Annals of Tourism Research , vol. 90, Sep. 2021, doi: 10.1016/j.annals.2021.103273.

K. Park and H. Rim, "Social media hoaxes, political ideology, and the role of issue confidence," Telematics and Informatics, vol. 36, pp. 1–11, Mar. 2019, doi: 10.1016/j.tele.2018.11.001.

F. Tchakounté, K. Amadou Calvin, A. A. A. Ari, and D. J. Fotsa Mbogne, "A smart contract logic to reduce hoax propagation across social media," Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 6, pp. 3070–3078, Jun. 2022, doi: 10.1016/j.jksuci.2020.09.001.

C. Moreno-Castro, E. Vengut-Climent, L. Cano-Orón, and I. Mendoza-Poudereux, "Exploratory study of the hoaxes spread via WhatsApp in Spain to prevent and/or cure COVID-19," Gaceta Sanitaria , vol. 35, no. 6, pp. 534–541, Nov. 2021, doi: 10.1016/j.gaceta.2020.07.008.

C. Palti Junjungan Sinaga and J. Yonatia, "Hoax Determination Campaign Through Gadget Applications," Serat Rupa Journal of Design, vol. 2, no. 2, pp. 119–129, 2018, doi: 10.28932/srjd.v2i2.805.

M. Syaiful, M. Akbar, and T. Bahfiarti, “Analyze Fact Checking of Haram Sinovac Vaccine Hoax on Twitter Social Media Status,” International Journal of Science and Applied Science: Conference Series, vol. 5, no. 1, p. 2021, 2021, doi: 10.20961/ijsascs.v5i1.62059.

D. Fardiah, F. Darmawan, and R. Rinawati, "Fact-checking Literacy of Covid-19 Infodemic on Social Media in Indonesia," Komunikator, vol. 14, no. 1, pp. 14–29, May 2022, doi: 10.18196/jkm.14459.

D. H. Vu, "Privacy-preserving Naive Bayes classification in semi-fully distributed data model," Comput Secur, vol. 115, Apr. 2022, doi: 10.1016/j.cose.2022.102630.

A. Nurdina and A. B. I. Puspita, "Naive Bayes and KNN for Airline Passenger Satisfaction Classification: Comparative Analysis," Journal of Information System Exploration and Research, vol. 1, no. 2, pp. 83–92, Jul. 2023, doi: 10.52465/joiser.v1i2.167.

A. B. Yilmaz, Y. S. Taspinar, and M. Koklu, "Classification of Malicious Android Applications Using Naive Bayes and Support Vector Machine Algorithms," Original Research Paper International Journal of Intelligent Systems and Applications in Engineering , vol. 10, no. 2, pp. 269–274, 2022, doi: 10.1039/b000000x.

T. L. Nikmah, M. Z. Ammar, Y. R. Allatif, R. M. P. Husna, P. A. Kurniasari, and A. S. Bahri, "Comparison of LSTM, SVM, and naive bayes for classifying sexual harassment tweets," Journal of Soft Computing Exploration, vol. 3, no. 2, Sep. 2022, doi: 10.52465/joscex.v3i2.85.

Mussalimun, E. H. Khasby, G. I. Dzikrillah, and Muljono, "Comparison of K-N earest Neighbor (K-NN) and Naïve Bayes Algorithm for Sentiment Analysis on Google Play Store Textual Reviews," in 2021 8th International Conference on Information Technology, Computer and Electrical Engineering, ICITACEE 2021, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 180–184. doi: 10.1109/ICITACEE53184.2021.9617217.

A. Ali, W. Samara, D. Alhaddad, A. Ware, and O. A. Saraereh, "Human Activity and Motion Pattern Recognition within Indoor Environment Using Convolutional Neural Networks Clustering and Naive Bayes Classification Algorithms," Sensors, vol. 22, no. 3, Feb. 2022, doi: 10.3390/s22031016.

C. Sirichanya and K. Kraisak, "Semantic data mining in the information age: A systematic review," International Journal of Intelligent Systems, 2021, doi: 10.1002/int.22443.

S. Zad, M. Heidari, J. H. Jones, and O. Uzuner, "A Survey on Concept-Level Sentiment Analysis Techniques of Textual Data," in 2021 IEEE World AI IoT Congress (AIIoT), Institute of Electrical and Electronics Engineers Inc., May 2021, pp. 285–291. doi: 10.1109/AIIoT52608.2021.9454169.

Y. Eroglu, "Text Mining Approach for Trend Tracking in Scientific Research: A Case Study on Forest Fire," Fire, vol. 6, no. 1, Jan. 2023, doi: 10.3390/fire6010033.

F. Gurcan and N. E. Cagiltay, "Research trends on distance learning: a text mining-based literature review from 2008 to 2018," Interactive Learning Environments. Routledge, 2020. doi: 10.1080/10494820.2020.1815795.

J. Park, D. Yang, and H. Y. Kim, "Text mining-based four-step framework for smart speaker product improvement and sales planning," Journal of Retailing and Consumer Services, vol. 71, Mar. 2023, doi: 10.1016/j.jretconser.2022.103186.

K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, "Text classification algorithms: A survey," Information (Switzerland), vol. 10, no. 4. MDPI AG, 2019. doi: 10.3390/info10040150.

S. Jancy Sickory Daisy and A. Rijuvana Begum, "Smart material to build mail spam filtering technique using Naive Bayes and MRF methodologies," in Materials Today: Proceedings, Elsevier Ltd, 2021, pp. 446–452. doi: 10.1016/j.matpr.2021.04.630.

H. Zhang, N. Cheng, Y. Zhang, and Z. Li, "Label flipping attacks against Naive Bayes on spam filtering systems," Applied Intelligence, vol. 51, no. 7, pp. 4503–4514, Jul. 2021, doi: 10.1007/s10489-020-02086-4.

M. H. Suh and M. Jeong, "Development of Bus Routes Reorganization Support Software Using the Naïve Bayes Classification Method," Sustainability (Switzerland), vol. 14, no. 8, Apr. 2022, doi: 10.3390/su14084400.

D. Ariadi and K. Fithriasari, "Indonesian News Classification Using Naive Bayesian Classification Method and Support Vector Machine With Confix Stripping Stemmer," Jurnal Sains dan Seni ITS, vol. 4, no. 2, pp. 2337–3520, 2015, doi: 10.12962/j23373520.v4i2.10966.

H. Muhabatin et al., "Classification of Hoax News Using Naïve Bayes Algorithm Based on PSO," Informatics for Educators and Professionals, vol. 5, no. 2, pp. 156–165, 2021, doi: 10.51211/itbi.v5i2.1531.