Breast Cancer Prediction Using a Hybrid Data Mining Model

Elham Bahmani - Islamic Azad University, Malayer, Iran
Mojtaba Jamshidi - Islamic Azad University, Qazvin, Iran
Abdusalam Shaltooki - University of Human Development, Sulaymaniyah, Iraq

Citation Format:



Today, with the emergence of data mining technology and access to useful data, valuable information in different areas can be explored. Data mining uses machine learning algorithms to extract useful relationships and knowledge from a large amount of data and offers an automatic tool for various predictions and classifications. One of the most common applications of data mining in medicine and health-care is to predict different types of breast cancer which has attracted the attention of many scientists. In this paper, a hybrid model employing three algorithms of Naive Bayes Network, RBF Network, and K-means clustering is presented to predict breast cancer type. In the proposed model, the voting approach is used to combine the results obtained from the above three algorithms. Dataset used in this study is called Breast Cancer Wisconsin taken from data sources of UCI. The proposed model is implemented in MATLAB and its efficiency in predicting breast cancer type is evaluated on Breast Cancer Wisconsin dataset. Results show that the proposed hybrid model achieves an accuracy of 99% and mean absolute error of 0.019 which is superior over other models.


Data mining, breast cancer, hybrid model, RBF network, Naive Bayes, K-means

Full Text:



DeSantis, C., Ma, J., Bryan, L. and Jemal, A., 2014. Breast cancer statistics, 2013. CA: a cancer journal for clinicians, 64(1), pp.52-62.

Harirchi, I., Karbakhsh, M., Kashefi, A. and Momtahen, A.J., 2004. Breast cancer in Iran: results of a multi-center study. Asian pacific journal of cancer prevention, 5(1), pp.24-27.

Delen, D., Walker, G. and Kadam, A., 2005. Predicting breast cancer survivability: a comparison of three data mining methods. Artificial intelligence in medicine, 34(2), pp.113-127.

Gupta, S., Kumar, D. and Sharma, A., 2011. Data mining classification techniques applied for breast cancer diagnosis and prognosis. Indian Journal of Computer Science and Engineering (IJCSE), 2(2), pp.188-195.

Kharya, S., 2012. Using data mining techniques for diagnosis and prognosis of cancer disease. arXiv preprint arXiv:1205.1923.

Rani, K.U., 2010. Parallel approach for diagnosis of breast cancer using neural network technique. International Journal of Computer Applications, 10(3), pp.1-5.

Kiani, B. and Atashi, A., 2014. A prognostic model based on data mining techniques to predict breast cancer recurrence. Journal of Health and Biomedical Informatics, 1(1), pp.26-31.

García-Laencina, P.J., Abreu, P.H., Abreu, M.H. and Afonoso, N., 2015. Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values. Computers in biology and medicine, 59, pp.125-133.

Chaurasia, V. and Pal, S., 2017. Data mining techniques: to predict and resolve breast cancer survivability. International Journal of Computer Science and Mobile Computing IJCSMC, 3(1), pp. 10 – 22.

Shajahaan, S.S., Shanthi, S. and ManoChitra, V., 2013. Application of data mining techniques to model breast cancer data. International Journal of Emerging Technology and Advanced Engineering, 3(11), pp.362-369.

Senturk, Z.K. and Kara, R., 2014. Breast cancer diagnosis via data mining: performance analysis of seven different algorithms. Computer Science & Engineering, 4(1), p.35.

Han, J., Pei, J. and Kamber, M., 2011. Data mining: concepts and techniques. Elsevier.