Big Healthcare Data: Survey of Challenges and Privacy

Mohammed Bin Jubeir - Faculty of Computer Systems & Software Engineering, Universiti Malaysia Pahang (UMP), 26300, Kuantan, Pahang, Malaysia.
Mohd Arfian Ismail - Faculty of Computer Systems & Software Engineering, Universiti Malaysia Pahang (UMP), 26300, Kuantan, Pahang, Malaysia.
Shahreen Kasim - Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), Malaysia
Hidra Amnur - Department of Information Technology, Politeknik Negeri Padang, Sumatera Barat, Indonesia
- Defni - Department of Information Technology, Politeknik Negeri Padang, Sumatera Barat, Indonesia

Citation Format:



The last century witnessed a dramatic leap in the shift towards digitizing the healthcare workflow and moving to e-patients' records. Health information is consistently becoming more diverse and complex, leading to the so-called massive data. Additionally, the demand for big data analytics in healthcare organizations is increasingly growing with the aim of providing a wide range of unprecedented potentials that are considered necessary for the provision of meaningful information about big data and improve the quality of healthcare delivery. It also aims to increase the effectiveness and efficiency of healthcare organizations; provide doctors and care providers better decision-making information and help them in the early detection of diseases. It also assists in evidence-based medicine and helps to minimize healthcare cost. However, a clear contradiction exists between the privacy and security of big data and its widespread usage. In this paper, the focus is on big data with respect to its characteristics, trends, and challenges. Additionally, the risks and benefits associated with data analytics were reviewed.

Full Text:



D. S. Terzi, R. Terzi, and S. Sagiroglu, “A survey on security and privacy issues in big data,†in Internet Technology and Secured Transactions (ICITST), 2015 10th International Conference for, 2015, pp. 202–207.

R. Patgiri and A. Ahmed, “Big Data: The V,†in 2016 IEEE 18th International Conference on High-Performance Computing and Communications, IEEE 14th International Conference on Smart City, and IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2016, pp. 17–24.

W. Raghupathi and V. Raghupathi, “Big data analytics in healthcare: promise and potential,†Heal. Inf. Sci. Syst., vol. 2, no. 1, p. 3, 2014.

A. Woodie, “Documentary Probes the Human Face of Big Data,†Datanami Inc, 2016. .

P. Rotella, “Is Data The New Oil?,†web publication. [Online]. Available: [Accessed: 25-Oct-2018].

C. L. P. Chen and C.-Y. Zhang, “Data-intensive applications, challenges, techniques and technologies: A survey on Big Data,†Inf. Sci. (Ny)., vol. 275, pp. 314–347, 2014.

W. I. Yudhistyra, E. M. Risal, I. Raungratanaamporn, and V. Ratanavaraha, “Using Big Data Analytics for Decision Making: Analyzing Customer Behavior using Association Rule Mining in a Gold, Silver, and Precious Metal Trading Company in Indonesia,†Int. J. Data Sci., vol. 1, no. 2, pp. 57–71, 2020.

A. Belle, R. Thiagarajan, S. M. Soroushmehr, F. Navidi, D. A. Beard, and K. Najarian, “Big data analytics in healthcare,†Biomed Res. Int., vol. 2015, 2015.

M. Adibuzzaman, P. DeLaurentis, J. Hill, and B. D. Benneyworth, “Big data in healthcare--the promises, challenges and opportunities from a research perspective: A case study with a model database,†in AMIA Annual Symposium Proceedings, 2017, vol. 2017, p. 384.

K. Abouelmehdi, A. Beni-Hessane, and H. Khaloufi, “Big healthcare data: preserving security and privacy,†J. Big Data, vol. 5, no. 1, p. 1, 2018.

L. E. I. Xu, C. Jiang, and J. Wang, “Information Security in Big Data : Privacy and Data Mining,†pp. 1149–1176, 2014.

S. Yu, “Big Privacy: Challenges and Opportunities of Privacy Study in the Age of Big Data,†IEEE Access, vol. 4, pp. 2751–2763, 2016, doi: 10.1109/ACCESS.2016.2577036.

M. Cox and D. Ellsworth, “Managing big data for scientific visualization,†in ACM Siggraph, 1997, vol. 97, pp. 21–38.

J. Manyika et al., “Big data: The next frontier for innovation, competition, and productivity,†2011.

G. Firican, “The 10 Vs of Big Data | Transforming Data with Intelligence.†.

Han Hu, Yonggang Wen, Tat-Seng Chua, and Xuelong Li, “Toward Scalable Systems for Big Data Analytics: A Technology Tutorial,†IEEE Access, vol. 2, pp. 652–687, 2014, doi: 10.1109/ACCESS.2014.2332453.

P. Jain, M. Gyanchandani, and N. Khare, “Big data privacy: a technological perspective and review,†J. Big Data, vol. 3, no. 1, p. 25, 2016.

J. Gantz and D. Reinsel, “Extracting value from chaos,†IDC iview, vol. 1142, no. 2011, pp. 1–12, 2011.

IBM Big Data & Analytics Hub, 2014IBM Big Data & Analytics Hub, “Infographic: The Four V’s of Big Data | IBM Big Data & Analytics Hub,†electronic file, 2014. .

J. M. Cavanillas, E. Curry, and W. Wahlster, New horizons for a data-driven economy: a roadmap for usage and exploitation of big data in Europe. Springer, 2016.

M. Moorthy, R. Baby, and S. Senthamaraiselvi, “An Analysis for Big Data and its Technologies.,†Int. J. Comput. Sci. Eng. Technol., vol. 4, no. 12, 2014.

M. Schroeck, R. Shockley, J. Smart, D. Romero-Morales, and P. Tufano, “Analytics: the real-world use of big data: How innovative enterprises extract value from uncertain data, Executive Report,†IBM Inst. Bus. Value Said Bus. Sch. Univ. Oxford, 2012.

P. Russom and others, “Big data analytics,†TDWI best Pract. report, fourth Quart., vol. 19, no. 4, pp. 1–34, 2011.

J. S. Hurwitz, A. Nugent, F. Halper, and M. Kaufman, Big data for dummies. John Wiley & Sons, 2013.

E. Ahmed et al., “The role of big data analytics in Internet of Things,†Comput. Networks, vol. 129, pp. 459–471, 2017.

J. Anuradha and others, “A brief introduction on Big Data 5Vs characteristics and Hadoop technology,†Procedia Comput. Sci., vol. 48, pp. 319–324, 2015.

S. Frost, “Drowning in big data? reducing information technology complexities and costs for healthcare organizations.†2015.

A. Oussous, F.-Z. Benjelloun, A. A. Lahcen, and S. Belfkih, “Big Data technologies: A survey,†J. King Saud Univ. Inf. Sci., vol. 30, no. 4, pp. 431–448, 2018.

“Big Data Platforms | Max Kanaskar’s Blog.†[Online]. Available:

J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques, 3rd ed. Elsevier, 2011.

U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to knowledge discovery in databases,†AI Mag., vol. 17, no. 3, p. 37, 1996.

C. Clifton, “Encyclop{æ}dia britannica: definition of data mining,†Retrieved on, vol. 9, no. 12, p. 2010, 2010.

N. Thushika and S. Premaratne, “A Data Mining Approach for Parameter Optimization in Weather Prediction,†Int. J. Data Sci., vol. 1, no. 1, pp. 1–13, 2020.

M. Siddique, M. A. Mirza, M. Ahmad, J. Chaudhry, and R. Islam, “A Survey of Big Data Security Solutions in Healthcare,†in International Conference on Security and Privacy in Communication Systems, 2018, pp. 391–406.

R. Mendes and J. P. Vilela, “Privacy-Preserving Data Mining: Methods, Metrics, and Applications,†IEEE Access, vol. 5, pp. 10562–10582, 2017, doi: 10.1109/ACCESS.2017.2706947.

N. Bairagi, “Available Online at A Survey on Privacy Preserving Data mining,†vol. 8, no. 5, pp. 2015–2018, 2017.

K. P. Rao and A. Chaudhary, “Survey on Privacy Preserving Data Mining,†Int. J. Comput. Sci. Inf. Technol., vol. 5, no. 3, pp. 3342–3343, 2014.

D. Nashik, “Novel Approaches for Privacy Preserving Data Mining in k- Anonymity Model,†J. Inf. Sci. Eng., vol. 78, pp. 63–78, 2016.

O. Maimon and A. Browarnik, “NHECD-Nano health and environmental commented database,†in Data mining and knowledge discovery handbook, Springer, Boston, MA, 2009, pp. 1221–1241.

D. Agrawal and C. C. Aggarwal, “On the design and quantification of privacy preserving data mining algorithms,†in Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2001, pp. 247–255.

R. Agrawal and R. Srikant, “Privacy-preserving data mining,†in ACM Sigmod Record, 2000, vol. 29, no. 2, pp. 439–450.

S. Agrawal and J. R. Haritsa, “A framework for high-accuracy privacy-preserving mining,†in Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on, 2005, pp. 193–204.

V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, and Y. Theodoridis, “State-of-the-art in privacy preserving data mining,†ACM SIGMOD Rec., vol. 33, no. 1, pp. 50–57, Mar. 2004, doi: 10.1145/974121.974131.

Y. A. A. S. Aldeen, M. Salleh, and M. A. Razzaque, “A comprehensive review on privacy preserving data mining,†Springerplus, vol. 4, no. 1, pp. 1–36, 2015, doi: 10.1186/s40064-015-1481-x.

N. Zhang, “Privacy-Preserving Data Mining Systems,†IEEE Comput. Soc., pp. 52–58, 2007.

N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: Privacy beyond k-anonymity and l-diversity,†in Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, 2007, pp. 106–115.

A. Sharma and N. Badal, “Literature Survey of Privacy Preserving Data Publishing ( PPDP ) Techniques,†Int. J. Eng. Comput. Sci., vol. 6, no. 5, pp. 1–12, 2017, doi: 10.18535/ijecs/v6i4.12.

A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, “ℓ-Diversity: privacy beyond k.†Anonymity, ICDE, 2006.

T. Li, N. Li, J. Zhang, and I. Molloy, “Slicing: A New Approach for Privacy Preserving Data Publishing,†IEEE Trans. Knowl. Data Eng., vol. 24, no. 3, pp. 561–574, Mar. 2012, doi: 10.1109/TKDE.2010.236.

R. J. Bayardo and R. Agrawal, “Data privacy through optimal k-anonymization,†in Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on, 2005, pp. 217–228.

A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, “l-diversity: Privacy beyond k-anonymity,†in Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference on, 2006, p. 24.

S. E. Fienberg and J. McIntyre, “Data swapping: Variations on a theme by dalenius and reiss,†in International Workshop on Privacy in Statistical Databases, 2004, pp. 14–29.

R. Brand, “Microdata Protection through Noise Addition,†in Inference control in statistical databases, Springer, 2002, pp. 97–116.

S. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee, “Toward privacy in public databases,†in Theory of Cryptography Conference, 2005, pp. 363–385.

C. K. Liew, U. J. Choi, and C. J. Liew, “A data distortion by probability distribution,†ACM Trans. Database Syst., vol. 10, no. 3, pp. 395–411, 1985.

D. B. Rubin, “Statistical disclosure limitation,†J. Off. Stat., vol. 9, no. 2, pp. 461–468, 1993.

J. Domingo-Ferrer, Inference Control in Statistical Databases, vol. 2316. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002.

R. C.-W. Wong and A. W.-C. Fu, “Privacy-preserving data publishing: An overview,†Synth. Lect. Data Manag., vol. 2, no. 1, pp. 1–138, 2010, doi:

L. Yang, J. Wu, L. Peng, and F. Liu, “Privacy-Preserving Data Mining Algorithm Based on Modified Particle Swarm Optimization,†in International Conference on Intelligent Computing, 2014, pp. 529–541.

K. Chen and L. Liu, “A random rotation perturbation approach to privacy preserving data classification,†2005.

X. Li, Z. Yan, and P. Zhang, “A review on privacy-preserving data mining,†in 2014 IEEE International Conference on Computer and Information Technology, 2014, pp. 769–774.

A. Shah and R. Gulati, “Privacy Preserving Data Mining: Techniques, Classification and Implications-A Survey,†Int. J. Comput. Appl., vol. 137, no. 12, 2016.

A. Meyerson and R. Williams, “On the complexity of optimal k-anonymity,†in Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2004, pp. 223–228.

S. Mohana and S. A. S. A. Mary, “Heuristics for privacy preserving data mining: An evaluation,†in Algorithms, Methodology, Models and Applications in Emerging Technologies (ICAMMAET), 2017 International Conference on, 2017, pp. 1–9.

Y. Ding and K. Klein, “Model-driven application-level encryption for the privacy of e-health data,†in Availability, Reliability, and Security, 2010. ARES’10 International Conference on, 2010, pp. 341–346.

M. Binjubeir, A. A. Ahmed, M. A. Bin Ismail, A. S. Sadiq, and M. Khurram Khan, “Comprehensive Survey on Big Data Privacy Protection,†IEEE Access, vol. 8, pp. 20067–20079, 2020, doi: 10.1109/ACCESS.2019.2962368.

J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,? in Proceedings of the 6th Conference on Operating Systems Design & Implementation,†Berkeley, CA, USA USENIX Assoc., p. 10, 2004.

S. Shahrivari, “Beyond batch processing: towards real-time and streaming big data,†Computers, vol. 3, no. 4, pp. 117–129, 2014.

J. Dean and S. Ghemawat, “MapReduce: a flexible data processing tool,†Commun. ACM, vol. 53, no. 1, pp. 72–77, 2010.

D. Garc’ia-Gil, S. Ram’irez-Gallego, S. Garc’ia, and F. Herrera, “A comparison on scalability for batch big data processing on Apache Spark and Apache Flink,†Big Data Anal., vol. 2, no. 1, p. 1, 2017.

J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,†Commun. ACM, vol. 51, no. 1, pp. 107–113, 2008.

S. M. Banaei, H. K. Moghaddam, and others, “Hadoop and its role in modern image processing,†Open J. Mar. Sci., vol. 4, no. 4, pp. 239–245, 2014.

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster computing with working sets.,†HotCloud, vol. 10, no. 10–10, p. 95, 2010.

P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas, “Apache flink: Stream and batch processing in a single engine,†Bull. IEEE Comput. Soc. Tech. Comm. Data Eng., vol. 36, no. 4, 2015.

O.-C. Marcu, A. Costan, G. Antoniu, and M. S. Pérez-Hernández, “Spark versus flink: Understanding performance in big data analytics frameworks,†in 2016 IEEE International Conference on Cluster Computing (CLUSTER), 2016, pp. 433–442.

T. White, Hadoop: The definitive guide. “ O’Reilly Media, Inc.,†2012.

Wikipedia, “Hadoop, Apache.†[Online]. Available: