Machine Learning Algorithms Based on Sampling Techniques for Raisin Grains Classification

Achmad Bisri - Universitas Islam Negeri Sultan Maulana Hasanuddin Banten, Serang, Indonesia
Mustafa Man - Universiti Malaysia Terengganu,Terengganu,Malaysia


Citation Format:



DOI: http://dx.doi.org/10.30630/joiv.7.1.970

Abstract


Raisin grains are among the agricultural commodities that can benefit health. The production of raisin grains needs to be classified to achieve optimal results. In this case, the classification is carried out on two types of grains, namely Kecimen and Besni. However, inaccurate sample data can affect the performance of the model. In this study, two sampling techniques are proposed: stratified and shuffled. The proposed classification model is RF, GBT, NB, LR, and NN. This study aims to identify the performance of classification models based on sampling techniques. Classification models are applied to the seven-features dataset, and modeling is done by cross-validation. The results of the models were tested with a different amount of test data. The performance of the models was evaluated related to accuracy and AUC. The best outcomes of all models based on stratified sampling were founded on tested data of 40 percent with a mean accuracy of 85.50% and an AUC of 0.921. In comparison, models based on shuffled sampling were founded on test data of 20 percent with a mean accuracy of 88.11% and an AUC of 0.935. On the other hand, classification models based on a stratified sampling of all data splits do not all models generate an excellent category. Whereas, based on shuffled sampling, all models resulted in the excellent category. Therefore, models based on shuffled sampling are superior to stratified sampling. The result of the significant test, RF, significantly differs based on sampling techniques.

Keywords


Classification; data mining; machine learning; raisin grains; sampling technique.

Full Text:

PDF

References


M. J. Schuster, X. Wang, T. Hawkins, and J. E. Painter, “A Comprehensive review of raisins and raisin components and their relationship to human health,†J. Nutr. Heal., vol. 50, no. 3, p. 203, 2017, doi: 10.4163/jnh.2017.50.3.203.

R. Khiari, H. Zemni, and D. Mihoubi, “Raisin processing: physicochemical, nutritional and microbiological quality characteristics as affected by drying process,†Food Rev. Int., vol. 35, no. 3, pp. 246–298, Apr. 2019, doi: 10.1080/87559129.2018.1517264.

T. Uzun and B. Hallaç, “Physicochemical Characteristic of The Organic Raisins Served to Markets in Midyat Town of Mardin Province and Beşiri Town of Batman Province in Turkey,†Erwerbs-Obstbau, vol. 63, no. S1, pp. 61–69, Oct. 2021, doi: 10.1007/s10341-021-00582-6.

A. Rahimi, A. Heshmati, and A. Nili-Ahmadabadi, “Changes in pesticide residues in field-treated fresh grapes during raisin production by different methods of drying,†Dry. Technol., vol. 0, no. 0, pp. 1–14, May 2021, doi: 10.1080/07373937.2021.1919140.

G. Singh, N. Kaushal, O. Tokusoglu, and A. Singh, “Optimization of process parameters for drying of red Grapes ( Vitis vinifera ) to raisin: A design expert laden approach,†J. Food Process. Preserv., no. September 2020, pp. 1–8, Jan. 2021, doi: 10.1111/jfpp.15248.

J. Wang, A. S. Mujumdar, H. Wang, X.-M. Fang, H.-W. Xiao, and V. Raghavan, “Effect of drying method and cultivar on sensory attributes, textural profiles, and volatile characteristics of grape raisins,†Dry. Technol., vol. 39, no. 4, pp. 495–506, Feb. 2021, doi: 10.1080/07373937.2019.1709199.

R. Sadeghi, E. Seyedabadi, and R. M. Moghaddam, “Evaluation of Microwave and Ozone Disinfections on the Color Characteristics of Iranian Export Raisins Through an Image Processing Technique,†J. Food Prot., vol. 82, no. 12, pp. 2080–2087, Dec. 2019, doi: 10.4315/0362-028X.JFP-19-296.

M. Khojastehnazhand and H. Ramezani, “Machine vision system for classification of bulk raisins using texture features,†J. Food Eng., vol. 271, no. September 2019, p. 109864, Apr. 2020, doi: 10.1016/j.jfoodeng.2019.109864.

Y. Zhao, X. Xu, and Y. He, “A Novel Hyperspectral Feature-Extraction Algorithm Based on Waveform Resolution for Raisin Classification,†Appl. Spectrosc., vol. 69, no. 12, pp. 1442–1456, Dec. 2015, doi: 10.1366/14-07617.

J. Guo et al., “Near-infrared spectroscopy combined with pattern recognition algorithms to quickly classify raisins,†Sci. Rep., vol. 12, no. 1, p. 7928, May 2022, doi: 10.1038/s41598-022-12001-1.

L. Feng, S. Zhu, C. Zhang, Y. Bao, P. Gao, and Y. He, “Variety Identification of Raisins Using Near-Infrared Hyperspectral Imaging,†Molecules, vol. 23, no. 11, p. 2907, Nov. 2018, doi: 10.3390/molecules23112907.

Y. Zhang, Y. Yang, C. Ma, and L. Jiang, “Identification of multiple raisins by feature fusion combined with NIR spectroscopy,†PLoS One, vol. 17, no. 7, p. e0268979, Jul. 2022, doi: 10.1371/journal.pone.0268979.

K. Mollazade, M. Omid, and A. Arefi, “Comparing data mining classifiers for grading raisins based on visual features,†Comput. Electron. Agric., vol. 84, pp. 124–131, Jun. 2012, doi: 10.1016/j.compag.2012.03.004.

A. Bakhshipour, A. Jafari, and A. Zomorodian, “Vision based features in moisture content measurement during raisin production,†World Appl. Sci. J., vol. 17, no. 7, pp. 860–869, 2012.

Y. Zhao, M. L. Guindo, X. Xu, X. Shi, M. Sun, and Y. He, “A Novel Raisin Segmentation Algorithm Based on Deep Learning and Morphological Analysis,†Eng. Agrícola, vol. 39, no. 5, pp. 639–648, Oct. 2019, doi: 10.1590/1809-4430-eng.agric.v39n5p639-648/2019.

N. Karimi, R. Ranjbarzadeh Kondrood, and T. Alizadeh, “An intelligent system for quality measurement of Golden Bleached raisins using two comparative machine learning algorithms,†Measurement, vol. 107, pp. 68–76, Sep. 2017, doi: 10.1016/j.measurement.2017.05.009.

İ. Çinar, M. Koklu, and Ş. Taşdemir, “Classification of Raisin Grains Using Machine Vision and Artificial Intelligence Methods,†Gazi J. Eng. Sci., vol. 6, no. 3, pp. 200–209, Dec. 2020, doi: 10.30855/gmbd.2020.03.03.

I. Cinar and M. KOKLU, “Classification of Rice Varieties Using Artificial Intelligence Methods,†Int. J. Intell. Syst. Appl. Eng., vol. 7, no. 3, pp. 188–194, Sep. 2019, doi: 10.18201/ijisae.2019355381.

F. Tarakci and I. A. Ozkan, “Comparison of classification performance of kNN and WKNN algorithms,†vol. 20, no. 02, pp. 32–37, 2021, [Online]. Available: https://sujes.selcuk.edu.tr/sujes/article/view/536.

T. Zaman, “An efficient exponential estimator of the mean under stratified random sampling,†Math. Popul. Stud., vol. 28, no. 2, pp. 104–121, Apr. 2021, doi: 10.1080/08898480.2020.1767420.

R. Rachmatika and A. Bisri, “Perbandingan Model Klasifikasi untuk Evaluasi Kinerja Akademik Mahasiswa,†JEPIN (Jurnal Edukasi dan Penelit. Inform., vol. 6, no. 3, pp. 417–422, 2020, doi: http://dx.doi.org/10.26418/jp.v6i3.43097.

A. Bisri and R. Rachmatika, “Integrasi Gradient Boosted Trees dengan SMOTE dan Bagging untuk Deteksi Kelulusan Mahasiswa,†J. Nas. Tek. Elektro dan Teknol. Inf., vol. 8, no. 4, p. 309, Nov. 2019, doi: 10.22146/jnteti.v8i4.529.

T. Chen and C. Guestrin, “XGBoost: a scalable tree boosting system In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY: ACM; 2016: 785–94.†2016.

M. J. Zaki and W. J. Meira, Data Mining and Machine Learning Fundamental Concepts and Algorithms, Second. 2020.

L. Niu, “A review of the application of logistic regression in educational research: common issues, implications, and suggestions,†Educ. Rev., vol. 72, no. 1, pp. 41–67, 2020, doi: 10.1080/00131911.2018.1483892.

C. C. Aggarwal, “Machine Learning for Text: An Introduction,†in Machine Learning for Text, Cham: Springer International Publishing, 2018, pp. 1–16.

V. Kotu and B. Deshpande, Data Science: Concepts and Practice, Second Edi. Morgan Kaufmann, 2019.

J. P. Quintas, F. Machado e Costa, and A. C. Braga, “ROSY Application for Selecting R Packages that Perform ROC Analysis,†in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12251 LNCS, 2020, pp. 199–213.

F. Gorunescu, Data Mining: Concepts, Models and Techniques, vol. 12. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011.

G. James, DanielaWitten, T. Hastie, and R. Tibshirani, Springer Texts in Statistics An Introduction to Statistical Learning wth application in R. 2013.