Illiteracy Classification Using K Means-Naïve Bayes Algorithm

Muhammad Firman Saputra - State University of Malang, Indonesia
Triyanna Widiyaningtyas - State University of Malang, Indonesia
Aji Wibawa - State University of Malang, Indonesia

Citation Format:



Illiteracy is an inability to recognize characters, both in order to read and write. It is a significant problem for countries all around the world including Indonesia. In Indonesia, illiteracy rate is generally set as an indicator to see whether or not education in Indonesia is successful. If this problem is not going to be overcome, it will affect people’s prosperity. One system that has been used to overcome this problem is prioritizing the treatment from areas with the highest illiteracy rate and followed by areas with lower illiteracy rate. The method is going to be a way easier to be applied if it is supported by classification process. Since the classification process needs a class, and there has not been any fine classification of illiteracy rate, there is needed a clustering process before classification process. This research is aimed to get optimal number of classes through clustering process and know the result of illiteracy classification process. The clustering process is conducted by using k means algorithm, and for the classification process is conducted by using Naïve Bayes algorithm. The testing method used to assess the success of classification process is 10-fold method. Based on the research result, it can be concluded that the optimal illiteracy classes are three classes with the classification accuracy value of 96.4912% and error rate value of 3.5088%. Whereas the classification with two classes get the accuracy value of 93.8596% and error rate value of 6.1404%. And for the classification with five classes get the accuracy value of 90.3509% and error rate value of 9.6491%.


Illiteracy; Clustering; K means; Classification; Naïve Bayes

Full Text:



Mariyono, “Strategi Pemberantasan Buta Aksara Melalui Penggunaan Teknik Metastasis Berbasis Keluarga,†Pancaran, vol. 5, no. 1, pp. 55–66, 2016.

R. D. Bekti, E. Irwansyah, and Andiyono, “Analisis Faktor yang Mempengaruhi Angka Buta Huruf Melalui Geographically Weighted Regression: Studi Kasus Propinsi Jawa Timur,†Comtech, vol. 4, no. 1, pp. 443–449, 2013.

R. Maharani and S. Winahju, “Pemodelan Angka Buta Huruf di Provinsi Sumatera Barat Tahun 2014 dengan Geographically Weighted Regression,†J. Sains dan Seni ITS, vol. 5, no. 2, pp. 361–267, 2016.

Antara. (2017) Indonesia Peringkat Keempat Penduduk dengan Buta Huruf Terbanyak homepage on Media Indonesia. [Online]. Available:

S. R. Ahmed, “Applications of Data Mining in Retail Business,†in International Conference on Information Technology: Coding and Computing, 2004, pp. 455 – 459.

H. Leidiyana, “Penerapan Algoritma K-Nearest Neighbor untuk Penentuan Resiko Kredit Kepemilikan Kendaraan Bermotor,†J. Penelit. Ilmu Komputer, Syst. Embed. Log., vol. 1, no. 1, pp. 65–76, 2013.

E.-H. Han, G. Karypis, and V. Kumar, “Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification,†in Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2001, pp. 53–65.

K. Polat and S. Güneş, “A Novel Hybrid Intelligent Method Based on C4.5 Decision Tree Classifier and One-Against-All Approach for Multi-Class Classification Problems,†Expert Syst. Appl., vol. 36, no. 2, pp. 1587–1592, 2009.

R. Abraham, J. B. Simha, and S. S. Iyengar, “A Comparative Analysis of Discretization Methods for Medical Datamining with Naïve Bayesian Classifier,†in Proceedings - 9th International Conference on Information Technology, 2007, pp. 235–236.

S. A. Pattekari and A. Parveen, “Prediction System for Heart Disease Using Naive Bayes,†Int. J. Adv. Comput. Math. Sci., vol. 3, no. 3, pp. 290–294, 2012.

C. Slamet, A. Rahman, M. A. Ramdhani, and W. Dharmalaksana, “Clustering the verses of the holy qur’an using K-means algorithm,†Asian J. Inf. Technol., vol. 15, no. 24, pp. 5159–5162, 2016.

K. a A. Nazeer and M. P. Sebastian, “Improving the Accuracy and Efficiency of the k-means Clustering Algorithm,†in Proceedings of the World Congress on Engineering, 2009, vol. I, pp. 1–5.

T. R. Patil and S. S. Sherekar, “Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification,†Int. J. Comput. Sci. Appl., vol. 6, no. 2, pp. 256–262, 2013.

Bustami, “Penerapan Algoritma Naive Bayes untuk Mengklasifikasi Data Nasabah Asuransi,†J. Inform., vol. 8, no. 1, pp. 1–15, 2014.

S. K. Lidya, O. S. Sitompul, and S. Efendi, “Sentiment Analysis pada Teks Bahasa Indonesia Menggunakan Support Vector Machine (SVM) dan K-Nearest Neighbor (K-NN),†in Seminar Nasional Teknologi dan Komunikasi, 2015, pp. 1–8.

J. D. Rodríguez, A. Pérez, and J. A. Lozano, “Sensitivity Analysis of Kappa-Fold Cross Validation in Prediction Error Estimation,†IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 3, pp. 569–575, 2010.

H. D. Masethe and M. A. Masethe, “Prediction of Heart Disease using Classification Algorithms,†in Proceedings of the World Congress on Engineering and Computer Science, 2014, vol. 2, pp. 22–24.