Data Mining Techniques for Pandemic Outbreak in Healthcare

Nur Izyan Suraya Abdul Satar - Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
Azlinah Mohamed - Institute for Big Data Analytics and Artificial Intelligence, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
Azliza Mohd Ali - Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia

Citation Format:



Pandemic outbreaks such as SARS-CoV, MERS-CoV and Covid-19 have attracted worldwide attention since these viruses have affected many countries and become a global public health issue. In 2019, Covid-19 was announced as a pandemic disease and categorized as a public health emergency globally. It is ranked as the sixth most serious pandemic internationally. This pandemic tracking and analysis require an appropriate method that gives better performance in terms of accuracy, precision and recall that defines its pattern since it involves huge and complicated datasets from the pandemic. Pattern identification is currently applied in many instances due to the rapid growth of data besides having the   potential to generate a knowledge-rich environment which can help to significantly improve the quality of clinical decisions and identify the relationships between data items. Therefore, there is a need to review the techniques in data mining on the pandemic outbreak that focuses on healthcare. The goal of this study was to analyze the algorithms from the data mining method that had been implemented for pandemic outbreaks in past research such as SARS-CoV, MERS-CoV and Covid-19. The result shows that 2 main algorithms, namely Naïve Bayes and Decision Tree, from the classification method, are appropriate algorithms and give more than 90% accuracy in both the pandemic and healthcare. This will be further considered and investigated for future analysis on large datasets of Covid-19 which can help researchers and healthcare practitioners in controlling the infection of the coronavirus using the data mining technique discussed.


Data mining techniques; pandemic outbreak; healthcare; classification; algorithms.

Full Text:



N. Almansour and H. Kurdia, “Identifying accurate classifier models for a text-based MERS-CoV dataset,†2017 Intelligent Systems Conference, IntelliSys 2017, 2018-Janua (September), pp. 430–435,, 2018.

M. K. Gupta and P. Chandra, “Original Research,†International Journal of Information Technology,, 2020.

S. A. Lashari, R. Ibrahim, N. Senan and N. S. A. M. Taujuddin, “Application of Data Mining Techniques for Medical Data Classification: A Review,†vol. 06003, pp. 1–6, 2018.

A. AlMoammar, L. AlHenaki and H. Kurdi, “Selecting accurate classifier models for a MERS-CoV dataset,†Advances in Intelligent Systems and Computing., 2018.

P. Kauser Ahmed, “Analysis of data mining tools for disease prediction,†Journal of Pharmaceutical Sciences and Research, 2017.

K. M. M. N. K and S. R, “Applications of Data Mining Techniques in Healthcare and Prediction of Heart Attacks,†International Journal of Data Mining Techniques and Applications,, 2018.

H. Harapan, N. Itoh, A. Yufika, W. Winardi, S. Keam, H. Te, D. Megawati, Z. Hayati, A. L. Wagner and M. Mudatsir, “Coronavirus disease 2019 (COVID-19): A literature review,†Journal of Infection and Public Health,, 2020.

C. Wang, P. W. Horby, F. G. Hayden and G. F. Gao, “A novel coronavirus outbreak of global health concern,†The Lancet, vol. 395, no.10223, pp. 470–473., 2020.

S. Jang, S. Lee, S. M. Choi J., Seo, H. Choi and T. Yoon, “Comparison between SARS CoV and MERS CoV Using apriori algorithm, decision tree, SVM,†MATEC Web of Conferences, vol. 49, pp. 4–7., 2016.

I. Al-Turaiki, M. Alshahrani and T. Almutairi, “Building predictive models for MERS-CoV infections using data mining techniques,†Journal of Infection and Public Health, vol. 9, no. 6, pp. 744–748,, 2016.

D. Kim, S. Hong, S. Choi and T. Yoon, “Analysis of transmission route of MERS coronavirus using decision tree and Apriori algorithm,†International Conference on Advanced Communication Technology,ICACT,, 2016.

Z. A. Memish, M. Cotten, S. J. Watson, P. Kellam, A. Zumla, R. F. Alhakeem, A. Assiri, A. A. A. Rabeeah and J. A. Al-Tawfiq, “Community Case Clusters of Middle East Respiratory Syndrome Coronavirus in Hafr Al-Batin, Kingdom of Saudi Arabia: A Descriptive Genomic study,†International Journal of Infectious Diseases, vol. 23, pp. 63–68., 2014.

M. Giovanetti, D. Benvenuto, S. Angeletti and M. Ciccozzi, “The first two cases of 2019-nCoV in Italy: Where they come from?,†Journal of Medical Virology, vol. 92, no. 5, pp. 518–521., 2020.

M. Abdullah, M. S. Altheyab, A. M. A. Lattas and W. F Algashmari, “MERS-CoV disease estimation (MDE) A study to estimate a MERS-CoV by classification algorithms,†Communication, Management, and Information Technology - Proceedings of the International Conference on Communication, Management, and Information Technology, ICCMIT 2016, December, pp. 633–638, 2017.

J. H. Yoo, “The Fight against the 2019-nCoV Outbreak: An Arduous March Has Just Begun,†Journal of Korean Medical Science, vol. 35, no. 4, pp. 2019–2021,, 2020.

V. Plotnikova, M. Dumas and F. Milani, “Adaptations of data mining methodologies: A systematic literature review,†PeerJ Computer Science,, 2020.

R. Ghorbani and R. Ghousi, “International Journal of Data and Network Science,†vol. 3, pp. 47–70., 2019.

P. N. Mahalle, N. P. Sable, N. P. Mahalle and R. Gitanjali, “Predictive Analytics of COVID-19 using Information, Communication and Technologies,†April 1–9., 2020.

S. Mukherjee, R. Shaw, N. Haldar and S. Changdar, “A Survey of Data Mining Applications and Techniques,†vol. 6, no. 5, pp. 4663–4666, 2015.

A. Saxena, M. Prasad, A. Gupta, N. Bharill, O. P. Patel, A. Tiwari, M. J. Er, W. Ding and C. T. Lin, “A review of clustering techniques and developments,†Neurocomputing,, 2017.

Y. U. Zheng, “Trajectory data mining: An Overview,†vol. 6, no. 3, pp. 1–41, 2015.

C. Zhou, W. Yuan, J. Wang, H. Xu, Y. Jiang, Q. H. Wen and P. Zhang, “Detecting suspected epidemic cases using trajectory big data,†pp. 1–19, 2020.

V. Chaurasia, “Application of machine learning time series analysis for prediction COVID-19 pandemic,†Cdc, 2020.

L. Ismail, H. Materwala, T. Znati, S. Turaev and M. A. B. Khan, “Tailoring time series models for forecasting coronavirus spread: Case studies of 187 countries,†Computational and Structural Biotechnology Journal, vol. 18, pp. 2972–3206,, 2020.

N. Jothi, N. A. Rashid and W. Husain, “Data Mining in Healthcare - A Review,†Procedia Computer Science, vol. 72, pp. 306–313., 2015.

M. A. Nishara Banu and B. Gomathy, “Disease forecasting system using data mining methods,†Proceedings - 2014 International Conference on Intelligent Computing Applications, ICICA 2014, pp. 130–133., 2014.

S. Kaur and R. K. Bawa, “Review on data mining techniques in healthcare sector,†Proceedings of the International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), I-SMAC 2018,, 2019.

R. Singh and E. Rajesh, “Prediction of Heart Disease by Clustering and Classification Techniques,†International Journal of Computer Sciences and Engineering., 2019.

M. Alehegn, R. R. Joshi and P. Mulay, “Diabetes Analysis and Prediction Using Random Forest, KNN, Naïve Bayes And J48: An Ensemble Approach,†vol. 8, no. 09, 2019.

I. Ahmed and A. Mousa, “Security and privacy issues in e-healthcare systems: Towards trusted services,†International Journal of Advanced Computer Science and Applications, vol. 7, no. 9, pp. 229–236., 2016.

M. A. Jabbar, B. L. Deekshatulu and P. Chandra, “Computational intelligence technique for early diagnosis of heart disease,†ICETECH 2015 - 2015 IEEE International Conference on Engineering and Technology., 2015.

T. R. Baitharu and S. K. Pani, “Analysis of data mining techniques for healthcare decision support system using liver disorder dataset. Procedia Computer Science, vol. 85, pp. 862–870., 2016.

J. Han, M. Kamber and J. Pei, “Data mining: Concepts and Techniques,†In Data mining: Concepts and techniques., 2012.

V. Gayathri, and M. C. Mona, “A survey of data mining techniques on medical diagnosis and research,†vol. 6, no. 6, pp. 301–310. 2014.

H. M. Zolbanin, D. Delen and A. Hassan Zadeh, “Predicting overall survivability in comorbidity of cancers: A data mining approach,†Decision Support Systems, vol. 74, pp. 150–161., 2015.

B. V. Chowdary, “A survey on applications of data mining techniques,†vol. 13, no. 7, pp. 5384–5392, 2018.

Amin, M. S., Chiam, Y. K., & Varathan, K. D. (2019). Identification of significant features and data mining techniques in predicting heart disease. Telematics and Informatics, 36, 82–93.

S. Vijayarani and S. Sudha, “An efficient clustering algorithm for predicting diseases from hemogram blood test samples,†pp. 1–8., 2015.

P. Radanliev, D. D. Roure and R. Walton, Diabetes & Metabolic Syndrome: Clinical Research & Reviews Data mining and analysis of scienti fi c research data records on Covid- 19 mortality, immunity, and vaccine development - In the first wave of the Covid-19 pandemic,†Diabetes & Metabolic Syndrome: Clinical Research & Reviews, vol. 14, no. 5, pp. 1121–1132., 2020.

R. Sandhu, S. K. Sood and G. Kaur, “An intelligent system for predicting and preventing MERS-CoV infection outbreak,†Journal of Supercomputing, vol. 72, no. 8, pp. 3033–3056., 2016.

A. Keshavarzi, “Coronavirus Infectious Disease (COVID-19) Modeling: Evidence of Geographical Signals,†SSRN Electronic Journal,, 2020.

L. J. Muhammad, M. M. Islam, S. S. Usman and S. I. Ayon, “Predictive data mining models for novel coronavirus (COVID-19) Infected Patients Recovery,†SN Computer Science, vol. 1, no. 4, pp. 1–7., 2020.