Illiteracy Classification Using K Means-NaÃ¯ve Bayes Algorithm

Muhammad Firman Aji Saputra; Triyanna Widiyaningtyas; Aji Prasetya Wibawa

doi:10.30630/joiv.2.3.129

Illiteracy Classification Using K Means-NaÃ¯ve Bayes Algorithm

Muhammad Firman Saputra - State University of Malang, Indonesia
Triyanna Widiyaningtyas - State University of Malang, Indonesia
Aji Wibawa - State University of Malang, Indonesia

Citation Format:

DOI: http://dx.doi.org/10.30630/joiv.2.3.129

Abstract

Illiteracy is an inability to recognize characters, both in order to read and write. It is a significant problem for countries all around the world including Indonesia. In Indonesia, illiteracy rate is generally set as an indicator to see whether or not education in Indonesia is successful. If this problem is not going to be overcome, it will affect peopleâ€™s prosperity. One system that has been used to overcome this problem is prioritizing the treatment from areas with the highest illiteracy rate and followed by areas with lower illiteracy rate. The method is going to be a way easier to be applied if it is supported by classification process. Since the classification process needs a class, and there has not been any fine classification of illiteracy rate, there is needed a clustering process before classification process. This research is aimed to get optimal number of classes through clustering process and know the result of illiteracy classification process. The clustering process is conducted by using k means algorithm, and for the classification process is conducted by using NaÃ¯ve Bayes algorithm. The testing method used to assess the success of classification process is 10-fold method. Based on the research result, it can be concluded that the optimal illiteracy classes are three classes with the classification accuracy value of 96.4912% and error rate value of 3.5088%. Whereas the classification with two classes get the accuracy value of 93.8596% and error rate value of 6.1404%. And for the classification with five classes get the accuracy value of 90.3509% and error rate value of 9.6491%.

Keywords

Illiteracy; Clustering; K means; Classification; NaÃ¯ve Bayes

Full Text:

PDF

References

Mariyono, â€œStrategi Pemberantasan Buta Aksara Melalui Penggunaan Teknik Metastasis Berbasis Keluarga,â€ Pancaran, vol. 5, no. 1, pp. 55â€“66, 2016.

R. D. Bekti, E. Irwansyah, and Andiyono, â€œAnalisis Faktor yang Mempengaruhi Angka Buta Huruf Melalui Geographically Weighted Regression: Studi Kasus Propinsi Jawa Timur,â€ Comtech, vol. 4, no. 1, pp. 443â€“449, 2013.

R. Maharani and S. Winahju, â€œPemodelan Angka Buta Huruf di Provinsi Sumatera Barat Tahun 2014 dengan Geographically Weighted Regression,â€ J. Sains dan Seni ITS, vol. 5, no. 2, pp. 361â€“267, 2016.

Antara. (2017) Indonesia Peringkat Keempat Penduduk dengan Buta Huruf Terbanyak homepage on Media Indonesia. [Online]. Available: http://mediaindonesia.com/read/detail/121475-indonesia-peringkat-keempat-penduduk-dengan-buta-huruf-terbanyak.

S. R. Ahmed, â€œApplications of Data Mining in Retail Business,â€ in International Conference on Information Technology: Coding and Computing, 2004, pp. 455 â€“ 459.

H. Leidiyana, â€œPenerapan Algoritma K-Nearest Neighbor untuk Penentuan Resiko Kredit Kepemilikan Kendaraan Bermotor,â€ J. Penelit. Ilmu Komputer, Syst. Embed. Log., vol. 1, no. 1, pp. 65â€“76, 2013.

E.-H. Han, G. Karypis, and V. Kumar, â€œText Categorization Using Weight Adjusted k-Nearest Neighbor Classification,â€ in Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2001, pp. 53â€“65.

K. Polat and S. GÃ¼neÅŸ, â€œA Novel Hybrid Intelligent Method Based on C4.5 Decision Tree Classifier and One-Against-All Approach for Multi-Class Classification Problems,â€ Expert Syst. Appl., vol. 36, no. 2, pp. 1587â€“1592, 2009.

R. Abraham, J. B. Simha, and S. S. Iyengar, â€œA Comparative Analysis of Discretization Methods for Medical Datamining with NaÃ¯ve Bayesian Classifier,â€ in Proceedings - 9th International Conference on Information Technology, 2007, pp. 235â€“236.

S. A. Pattekari and A. Parveen, â€œPrediction System for Heart Disease Using Naive Bayes,â€ Int. J. Adv. Comput. Math. Sci., vol. 3, no. 3, pp. 290â€“294, 2012.

C. Slamet, A. Rahman, M. A. Ramdhani, and W. Dharmalaksana, â€œClustering the verses of the holy qurâ€™an using K-means algorithm,â€ Asian J. Inf. Technol., vol. 15, no. 24, pp. 5159â€“5162, 2016.

K. a A. Nazeer and M. P. Sebastian, â€œImproving the Accuracy and Efficiency of the k-means Clustering Algorithm,â€ in Proceedings of the World Congress on Engineering, 2009, vol. I, pp. 1â€“5.

T. R. Patil and S. S. Sherekar, â€œPerformance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification,â€ Int. J. Comput. Sci. Appl., vol. 6, no. 2, pp. 256â€“262, 2013.

Bustami, â€œPenerapan Algoritma Naive Bayes untuk Mengklasifikasi Data Nasabah Asuransi,â€ J. Inform., vol. 8, no. 1, pp. 1â€“15, 2014.

S. K. Lidya, O. S. Sitompul, and S. Efendi, â€œSentiment Analysis pada Teks Bahasa Indonesia Menggunakan Support Vector Machine (SVM) dan K-Nearest Neighbor (K-NN),â€ in Seminar Nasional Teknologi dan Komunikasi, 2015, pp. 1â€“8.

J. D. RodrÃguez, A. PÃ©rez, and J. A. Lozano, â€œSensitivity Analysis of Kappa-Fold Cross Validation in Prediction Error Estimation,â€ IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 3, pp. 569â€“575, 2010.

H. D. Masethe and M. A. Masethe, â€œPrediction of Heart Disease using Classification Algorithms,â€ in Proceedings of the World Congress on Engineering and Computer Science, 2014, vol. 2, pp. 22â€“24.

Username
Password
Remember me