Stock Price Time Series Data Forecasting Using the Light Gradient Boosting Machine (LightGBM) Model

Anggit Hartanto - Universitas Amikom Yogyakarta, Yogyakarta, 55282, Indonesia
Yanuar Nur Kholik - Universitas Amikom Yogyakarta, Yogyakarta, 55282, Indonesia
Yoga Pristyanto - Universitas Amikom Yogyakarta, Yogyakarta, 55282, Indonesia

Citation Format:



In the world of stock investment, one of the things that commonly happens is stock price fluctuations or the ups and downs of stock prices. As a result of these fluctuations, many novice investors are afraid to play stocks. However, on the other hand, stocks are a type of investment that can be relied upon during disasters or economic turmoil, such as in 2019, namely the Covid-19 pandemic. For stock price fluctuations to be estimated by investors, it is necessary to carry out a forecasting activity. This study builds stock price forecasting using the Light Gradient Boosting Machine (LightGBM) algorithm, which has high accuracy and efficiency. To forecast stock price time series, the model used is the LightGBM ensemble. At the same time, they were optimizing the determination of hyperparameters using Grid Search Cross Validation (GSCV). This study will also compare the LGBM algorithm with other algorithms to see which model is optimal in forecasting price stock data. In this study, the test used the RMSE metric by comparing the original data (testing data) with the predicted results. The experimental results show that the LightGBM model can compete with and outperform boosting-based forecasting models like XGBoost, AdaBoost, and CatBoost. In comparing forecasting models, the same dataset is used so that the results are accurate, and the comparisons are equivalent. In future research, paying attention to the data during pre-processing is necessary because it has many outliers. In addition, it is necessary to include exogenous variables and external variables, which are determined to involve many parties.


Machine learning; prediction; forecasting; time series; LightGBM

Full Text:



M. M. S. Saragih, T.Nurhaida, S. Sinaga, R. N. Ilham, and Faisal, “The impact of the Covid-19 pandemic on stock performance : Evidence from Indonesia,” Manag. Res. Behav. J., vol. 1, no. 1, pp. 1–6, 2021.

H. Rezaei, H. Faaljou, and G. Mansourfar, “Stock price prediction using deep learning and frequency decomposition,” Expert Systems with Applications, vol. 169, p. 114332, May 2021, doi: 10.1016/j.eswa.2020.114332.

R. Chandra and Y. He, “Bayesian neural networks for stock price forecasting before and during COVID-19 pandemic,” PLOS ONE, vol. 16, no. 7, p. e0253217, Jul. 2021, doi: 10.1371/journal.pone.0253217.

W. Lu, J. Li, J. Wang, and L. Qin, “A CNN-BiLSTM-AM method for stock price prediction,” Neural Computing and Applications, vol. 33, no. 10, pp. 4741–4753, Nov. 2020, doi: 10.1007/s00521-020-05532-z.

H. T. H. Ton and T. K. Dao, “The Effects of Psychology on Individual Investors’ Behaviors: Evidence from the Vietnam Stock Exchange,” Journal of Management and Sustainability, vol. 4, no. 3, Aug. 2014, doi: 10.5539/jms.v4n3p125.

E. Mulyani, H. Fitra, and F. F. Honesty, “Investment Decisions: The Effect of Risk Perceptions and Risk Propensity for Beginner Investors in West Sumatra,” Seventh Padang Int. …, vol. 192, no. Piceeba, pp. 49–55, 2021.

W. Budiharto, “Data science approach to stock prices forecasting in Indonesia during Covid-19 using Long Short-Term Memory (LSTM),” Journal of Big Data, vol. 8, no. 1, Mar. 2021, doi:10.1186/s40537-021-00430-0.

M. Shahvaroughi Farahani and S. H. Razavi Hajiagha, “Forecasting stock price using integrated artificial neural network and metaheuristic algorithms compared to time series models,” Soft Computing, vol. 25, no. 13, pp. 8483–8513, Apr. 2021, doi: 10.1007/s00500-021-05775-5.

D. Cheng, F. Yang, S. Xiang, and J. Liu, “Financial time series forecasting with multi-modality graph neural network,” Pattern Recognition, vol. 121, p. 108218, Jan. 2022, doi:10.1016/j.patcog.2021.108218.

E. S. Abdulla, A. Hamdan, and H. Akeel, “The Impact of Artificial Intelligence on Financial Institutes Services During Crisis: A Review of the Literature,” in Digitalisation: Opportunities and Challenges for Business, 2023, pp. 642–655.

H. Yu, L. J. Ming, R. Sumei, and Z. Shuping, “A Hybrid Model for Financial Time Series Forecasting—Integration of EWT, ARIMA With The Improved ABC Optimized ELM,” IEEE Access, vol. 8, pp. 84501–84518, 2020, doi: 10.1109/access.2020.2987547.

Z. Li, J. Han, and Y. Song, “On the forecasting of high‐frequency financial time series based on ARIMA model improved by deep learning,” Journal of Forecasting, vol. 39, no. 7, pp. 1081–1097, Mar. 2020, doi: 10.1002/for.2677.

U. M. Sirisha, M. C. Belavagi, and G. Attigeri, “Profit Prediction Using ARIMA, SARIMA and LSTM Models in Time Series Forecasting: A Comparison,” IEEE Access, vol. 10, pp. 124715–124727, 2022, doi: 10.1109/access.2022.3224938.

T. C. Nokeri, “Forecasting Using ARIMA, SARIMA, and the Additive Model,” in Implementing Machine Learning for Finance: A Systematic Approach to Predictive Risk and Performance Analysis for Investment Portfolios, Berkeley, CA: Apress, 2021, pp. 21–50.

Z. Fang, X. Ma, H. Pan, G. Yang, and G. R. Arce, “Movement forecasting of financial time series based on adaptive LSTM-BN network,” Expert Systems with Applications, vol. 213, p. 119207, Mar. 2023, doi: 10.1016/j.eswa.2022.119207.

A. H. Bukhari, M. A. Z. Raja, M. Sulaiman, S. Islam, M. Shoaib, and P. Kumam, “Fractional Neuro-Sequential ARFIMA-LSTM for Financial Market Forecasting,” IEEE Access, vol. 8, pp. 71326–71338, 2020, doi: 10.1109/access.2020.2985763.

J. Cao, Z. Li, and J. Li, “Financial time series forecasting model based on CEEMDAN and LSTM,” Physica A: Statistical Mechanics and its Applications, vol. 519, pp. 127–139, Apr. 2019, doi:10.1016/j.physa.2018.11.061.

Q. Gu, Y. Chang, N. Xiong, and L. Chen, “Forecasting Nickel futures price based on the empirical wavelet transform and gradient boosting decision trees,” Applied Soft Computing, vol. 109, p. 107472, Sep. 2021, doi: 10.1016/j.asoc.2021.107472.

T. Le, B. Vo, H. Fujita, N.-T. Nguyen, and S. W. Baik, “A fast and accurate approach for bankruptcy forecasting using squared logistics loss with GPU-based extreme gradient boosting,” Information Sciences, vol. 494, pp. 294–310, Aug. 2019, doi:10.1016/j.ins.2019.04.060.

F. Zhou, Q. Zhang, D. Sornette, and L. Jiang, “Cascading logistic regression onto gradient boosted decision trees for forecasting and trading stock indices,” Applied Soft Computing, vol. 84, p. 105747, Nov. 2019, doi: 10.1016/j.asoc.2019.105747

F. I. Durrah, Y. Yulia, T. P. Parhusip, and A. Rusyana, “Peramalan Jumlah Penumpang Pesawat Di Bandara Sultan Iskandar Muda Dengan Metode SARIMA (Seasonal Autoregressive Integrated Moving Average),” Journal of Data Analysis, vol. 1, no. 1, pp. 1–11, Sep. 2018, doi: 10.24815/jda.v1i1.11847.

N. S. Arunraj, D. Ahrens, and M. Fernandes, “Application of SARIMAX Model to Forecast Daily Sales in Food Retail Industry,” International Journal of Operations Research and Information Systems, vol. 7, no. 2, pp. 1–21, Apr. 2016, doi:10.4018/ijoris.2016040101.

A. Khumaidi, R. Raafi’udin, and I. P. Solihin, “Pengujian Algoritma Long Short-Term Memory untuk Prediksi Kualitas Udara dan Suhu Kota Bandung,” J. Telemat., vol. 15, no. 1, pp. 13–18, 2020.

J. H. Friedman, “Greedy function approximation: A gradient boosting machine.,” The Annals of Statistics, vol. 29, no. 5, Oct. 2001, doi:10.1214/aos/1013203451.

G. Ke et al., “LightGBM: A highly efficient gradient boosting decision tree,” in 31st Conference on Neural Information Processing Systems (NIPS 2017), 2017, vol. 2017-Decem, no. Nips, pp. 3147–3155.

S. Makridakis, E. Spiliotis, and V. Assimakopoulos, “M5 accuracy competition: Results, findings, and conclusions,” International Journal of Forecasting, vol. 38, no. 4, pp. 1346–1364, Oct. 2022, doi: 10.1016/j.ijforecast.2021.11.013.

T. Chen et al., “Prediction of Extubation Failure for Intensive Care Unit Patients Using Light Gradient Boosting Machine,” IEEE Access, vol. 7, pp. 150960–150968, 2019, doi: 10.1109/access.2019.2946980.

A. A. Taha and S. J. Malebary, “An Intelligent Approach to Credit Card Fraud Detection Using an Optimized Light Gradient Boosting Machine,” IEEE Access, vol. 8, pp. 25579–25587, 2020, doi:10.1109/access.2020.2971354.

M. Gan, S. Pan, Y. Chen, C. Cheng, H. Pan, and X. Zhu, “Application of the Machine Learning LightGBM Model to the Prediction of the Water Levels of the Lower Columbia River,” Journal of Marine Science and Engineering, vol. 9, no. 5, p. 496, May 2021, doi:10.3390/jmse9050496.

P. Pokhrel, “A LightGBM based Forecasting of Dominant Wave Periods in Oceanic Waters,” in Proceedings of ACM Conference on Information and Knowledge Management. (CIKM’21), 2018, vol. 9.

A. M. Husein and M. Harahap, “Pendekatan Data Science untuk Menemukan Churn Pelanggan pada Sector Perbankan dengan Machine Learning,” Data Sciences Indonesia (DSI), vol. 1, no. 1, pp. 8–13, Nov. 2021, doi: 10.47709/dsi.v1i1.1169.

M. Chlebus, M. Dyczko, and M. Woźniak, “Nvidia’s Stock Returns Prediction Using Machine Learning Techniques for Time Series Forecasting Problem,” Central European Economic Journal, vol. 8, no. 55, pp. 44–62, Jan. 2021, doi: 10.2478/ceej-2021-0004.

J. Li, “Monthly Housing Rent Forecast based on LightGBM (Light Gradient Boosting) Model,” NCCP Int. J. Intell. Inf. Manag. Sci., vol. 7, no. 6, pp. 2307–0692, 2018.

J. Wang, T. Ji, and M. Li, “A Combined Short-Term Forecast Model of Wind Power Based on Empirical Mode Decomposition and Augmented Dickey-Fuller Test,” Journal of Physics: Conference Series, vol. 2022, no. 1, p. 012017, Sep. 2021, doi: 10.1088/1742-6596/2022/1/012017.

M. Ahmed et al., “Bubble Identification in the Emerging Economy Fuel Price Series: Evidence from Generalized Sup Augmented Dickey–Fuller Test,” Processes, vol. 10, no. 1, p. 65, Dec. 2021, doi:10.3390/pr10010065.

A. Kagalwala, “kpsstest: A command that implements the Kwiatkowski, Phillips, Schmidt, and Shin test with sample-specific critical values and reports p-values,” The Stata Journal: Promoting communications on statistics and Stata, vol. 22, no. 2, pp. 269–292, Jun. 2022, doi: 10.1177/1536867x221106371.

Marsani, Ani Shabri, Basri Badyalina, Nur Amalina Mat Jan, and Mohd Shareduwan Mohd Kasihmuddin, “Efficient Market Hypothesis for Malaysian Extreme Stock Return: Peaks over a Threshold Method,” Mat. Mjim, vol. 38, no. 2, pp. 141–155, 2022.

H. Alimohammadi and S. Nancy Chen, “Performance evaluation of outlier detection techniques in production timeseries: A systematic review and meta-analysis,” Expert Systems with Applications, vol. 191, p. 116371, Apr. 2022, doi: 10.1016/j.eswa.2021.116371.

A. Blázquez-García, A. Conde, U. Mori, and J. A. Lozano, “A Review on Outlier/Anomaly Detection in Time Series Data,” ACM Computing Surveys, vol. 54, no. 3, pp. 1–33, Apr. 2021, doi: 10.1145/3444690.

D. M. Belete and M. D. Huchaiah, “Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results,” International Journal of Computers and Applications, vol. 44, no. 9, pp. 875–886, Sep. 2021, doi:10.1080/1206212x.2021.1974663.

M. Adnan, A. A. S. Alarood, M. I. Uddin, and I. ur Rehman, “Utilizing grid search cross-validation with adaptive boosting for augmenting performance of machine learning models,” PeerJ Computer Science, vol. 8, p. e803, Feb. 2022, doi: 10.7717/peerj-cs.803.

S. Chen, H. Zhu, W. Liang, L. Yuan, and X. Wei, “A Stock Index Prediction Method and Trading Strategy Based on the Combination of Lasso-Grid Search-Random Forest,” in Intelligent Computing and Block Chain, 2021, pp. 431–448.