Enhancing Weather Prediction Models through the Application of Random Forest Method and Chi-Square Feature Selection

Helena Nurramdhani Irmanda; Ermatita Ermatita; Mohd Khalid bin Awang; Muhammad Adrezo

doi:10.62527/joiv.8.3.2356

Enhancing Weather Prediction Models through the Application of Random Forest Method and Chi-Square Feature Selection

Helena Irmanda - Universitas Pembangunan Nasional Veteran Jakarta, Cilandak, Jakarta Selatan, Indonesia
Ermatita Ermatita - Universitas Sriwijaya, Ogan Ilir, Palembang, Indonesia
Mohd Khalid bin Awang - Universitas Sultan Zainal Abidin, Terengganu, Malaysia
Muhammad Adrezo - Universitas Pembangunan Nasional Veteran Jakarta, Cilandak, Jakarta Selatan, Indonesia

Citation Format:

DOI: http://dx.doi.org/10.62527/joiv.8.3.2356

Abstract

This study discovers weather forecast methodologies, concentrating mainly on the climatic issues faced by Indramayu Regency and its considerable impact on agriculture, specifically rice production and national food security. The study emphasizes the crucial need for accurate weather forecasting, especially in the context of ongoing climate change, by highlighting the region's vulnerability to weather anomalies and their possible disruption of crop output. To solve these issues, the study investigates machine learning techniques, particularly ensemble learning methods such as Random Forest in conjunction with Chi-Square feature selection. The article thoroughly outlines the research approach, including data collection from Indonesia's Meteorology, Climatology, and Geophysics Agency (BMKG), data pre-processing, feature selection processes, and data splitting. Notably, the methodology integrates the Synthetic Minority Over-sampling Technique (SMOTE) to adjust imbalanced data and uses key weather attributes for model construction (humidity, wind speed, and direction). The resulting Random Forest model performs well, with an accuracy rate of 87.6% in forecasting different types of rainfall. However, the study indicates potential overfitting in some rainfall classes, implying the need for additional data augmentation or modeling technique refining. In conclusion, this study demonstrates the potential efficacy of ensemble learning techniques in weather prediction, focusing on the Indramayu Regency. It emphasizes the need for exact forecasts in the agricultural and fisheries industries and suggests possibilities for additional investigation, such as research into alternative prediction approaches such as deep learning.

Keywords

Ensemble learning; random forest; prediction, weather.

Full Text:

PDF

References

K. Kbbi, “Kamus Besar Bahasa Indonesia (KBBI),” Kementerian Pendidikan Dan Budaya, 2016.

Kabupaten Indramayu, “Kabupaten Indramayu - Website Resmi Pemerintah Daerah Provinsi Jawa Barat.” Accessed: Mar. 17, 2022. [Online]. Available: https://jabarprov.go.id/index.php/pages/id/1052

JabarProv, “Anomali Cuaca Ancam Produksi Padi - Website Resmi Pemerintah Daerah Provinsi Jawa Barat.” Accessed: Mar. 17, 2022. [Online]. Available: https://jabarprov.go.id/index.php/news/8773/Anomali_Cuaca_Ancam_Produksi_Padi

B. Bochenek and Z. Ustrnul, “Machine Learning in Weather Prediction and Climate Analyses—Applications and Perspectives,” Atmosphere, vol. 13, no. 2, p. 180, Jan. 2022, doi:10.3390/atmos13020180.

D. N. Fente and D. Kumar Singh, “Weather Forecasting Using Artificial Neural Network,” 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 1757–1761, Apr. 2018, doi:10.1109/icicct.2018.8473167.

P. Karvelis, S. Kolios, G. Georgoulas, and C. Stylios, “Ensemble learning for forecasting main meteorological parameters,” 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3711–3714, Oct. 2017, doi: 10.1109/smc.2017.8123210.

N. Singh, S. Chaturvedi, and S. Akhter, “Weather Forecasting Using Machine Learning Algorithm,” 2019 International Conference on Signal Processing and Communication (ICSC), Mar. 2019, doi: 0.1109/icsc45622.2019.8938211.

F. Q. Kareem, A. M. Abdulazeez, and D. A. Hasan, “Predicting Weather Forecasting State Based on Data Mining Classification Algorithms,” Asian Journal of Research in Computer Science, pp. 13–24, Jun. 2021, doi: 10.9734/ajrcos/2021/v9i330222.

U. M. Khaire and R. Dhanalakshmi, “Stability of feature selection algorithm: A review,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 4, pp. 1060–1073, Apr. 2022, doi: 10.1016/j.jksuci.2019.06.012.

B. O. Macaulay, B. S. Aribisala, S. A. Akande, B. A. Akinnuwesi, and O. A. Olabanjo, “Breast cancer risk prediction in African women using Random Forest Classifier,” Cancer Treatment and Research Communications, vol. 28, p. 100396, 2021, doi:10.1016/j.ctarc.2021.100396.

S. K. Trivedi, “A study on credit scoring modeling with different feature selection and machine learning approaches,” Technology in Society, vol. 63, p. 101413, Nov. 2020, doi:10.1016/j.techsoc.2020.101413.

R. Spencer, F. Thabtah, N. Abdelhamid, and M. Thompson, “Exploring feature selection and classification methods for predicting heart disease,” Digital Health, vol. 6, Jan. 2020, doi:10.1177/2055207620914777.

BMKG, “Data Online - Pusat Database - BMKG.” Accessed: Nov. 11, 2023. [Online]. Available: https://dataonline.bmkg.go.id/home.

D. Munková, M. Munk, and M. Vozár, “Data Pre-processing Evaluation for Text Mining: Transaction/Sequence Model,” Procedia Computer Science, vol. 18, pp. 1198–1207, 2013, doi: 10.1016/j.procs.2013.05.286.

I. F. Ilyas and X. Chu, Data cleaning. Morgan & Claypool, 2019.

T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona, “A survey on missing data in machine learning,” Journal of Big Data, vol. 8, no. 1, Oct. 2021, doi: 10.1186/s40537-021-00516-9.

S. Bahassine, A. Madani, M. Al-Sarem, and M. Kissi, “Feature selection using an improved Chi-square for Arabic text classification,” Journal of King Saud University - Computer and Information Sciences, vol. 32, no. 2, pp. 225–231, Feb. 2020, doi: 10.1016/j.jksuci.2018.05.010.

Q. H. Nguyen et al., “Influence of Data Splitting on Performance of Machine Learning Models in Prediction of Shear Strength of Soil,” Mathematical Problems in Engineering, vol. 2021, pp. 1–15, Feb. 2021, doi: 10.1155/2021/4832864.

H. Ali, M. N. Mohd Salleh, R. Saedudin, K. Hussain, and M. F. Mushtaq, “Imbalance class problems in data mining: a review,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 14, no. 3, p. 1552, Jun. 2019, doi: 10.11591/ijeecs.v14.i3.pp1552-1563.

G. A. Pradipta, R. Wardoyo, A. Musdholifah, I. N. H. Sanjaya, and M. Ismail, “SMOTE for Handling Imbalanced Data Problem : A Review,” 2021 Sixth International Conference on Informatics and Computing (ICIC), pp. 1–8, Nov. 2021, doi: 10.1109/icic54025.2021.9632912.

Z.-H. Zhou, “Ensemble Learning,” Machine Learning, pp. 181–210, 2021, doi: 10.1007/978-981-15-1967-3_8.

X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, “A survey on ensemble learning,” Frontiers of Computer Science, vol. 14, no. 2, pp. 241–258, Aug. 2019, doi: 10.1007/s11704-019-8208-z.

O. Sagi and L. Rokach, “Ensemble learning: A survey,” WIREs Data Mining and Knowledge Discovery, vol. 8, no. 4, Feb. 2018, doi:10.1002/widm.1249.

X. Gao, M. I. Ramli, and S. M. Z. Syed Zainal Ariffin, “A Comparative Analysis of Combination of CNN-Based Models with Ensemble Learning on Imbalanced Data,” JOIV : International Journal on Informatics Visualization, vol. 8, no. 1, p. 456, Mar. 2024, doi:10.62527/joiv.8.1.2194.

H. Tyralis, G. Papacharalampous, and A. Langousis, “A Brief Review of Random Forests for Water Scientists and Practitioners and Their Recent History in Water Resources,” Water, vol. 11, no. 5, p. 910, Apr. 2019, doi: 10.3390/w11050910.

A. E. K. Gunawan and A. Wibowo, “Stock Price Movement Classification Using Ensembled Model of Long Short-Term Memory (LSTM) and Random Forest (RF),” JOIV : International Journal on Informatics Visualization, vol. 7, no. 4, Dec. 2023, doi:10.30630/joiv.7.4.1640.

H. Hairani, A. Anggrawan, and D. Priyanto, “Improvement Performance of the Random Forest Method on Unbalanced Diabetes Data Classification Using Smote-Tomek Link,” JOIV : International Journal on Informatics Visualization, vol. 7, no. 1, p. 258, Feb. 2023, doi:10.30630/joiv.7.1.1069.

Gde Agung Brahmana Suryanegara, Adiwijaya, and Mahendra Dwifebri Purbolaksono, “Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes Menggunakan Metode Normalisasi,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 1, pp. 114–122, Feb. 2021, doi: 10.29207/resti.v5i1.2880.

J. L. Speiser, M. E. Miller, J. Tooze, and E. Ip, “A comparison of random forest variable selection methods for classification prediction modeling,” Expert Systems with Applications, vol. 134, pp. 93–101, Nov. 2019, doi:10.1016/j.eswa.2019.05.028.

J. Wang, X. Jing, Z. Yan, Y. Fu, W. Pedrycz, and L. T. Yang, “A Survey on Trust Evaluation Based on Machine Learning,” ACM Computing Surveys, vol. 53, no. 5, pp. 1–36, Sep. 2020, doi: 10.1145/3408292.

S. Adinugroho and Y. A. Sari, Implementasi Data Mining Menggunakan Weka. Universitas Brawijaya Press, 2018.

BMKG, “Probabilistik Curah Hujan 20 mm (tiap 24 jam) | BMKG.” Accessed: Jan. 16, 2024. [Online]. Available: https://www.bmkg.go.id/cuaca/probabilistik-curah-hujan.bmkg.

Username
Password
Remember me