Implementation of Convolutional Neural Network and Long Short-Term Memory Algorithms in Human Activity Recognition Based on Visual Processing Video

Andi Rachman - Universitas Siliwangi Tasikmalaya, Indonesia
Husni Mubarok - Universitas Siliwangi Tasikmalaya, Indonesia
Euis Nur Fitriani Dewi - Universitas Siliwangi Tasikmalaya, Indonesia
Rama Edwinda Putra - Universitas Siliwangi Tasikmalaya, Indonesia

Citation Format:



Human Activity Recognition (HAR) is an interesting research topic, especially in identifying human movement actions focusing on video-based security surveillance. Symptom of an illness from a movement. The use of HAR in this research is the key to better understanding the various semantics contained in the video to find out the pattern of a human movement, especially in sports movements. In this study, a combination of the CNN and LSTM method algorithms was applied by using several variations of the model parameter values on the dropout layer and batch size to convert the pattern in the video into image form to produce a HAR model. Data processing at the convolution layer is used to extract spatial features in the frame. The extraction results are fed to the LSTM layer on each network for modeling the temporal sequence of human movement. In this way, the network on the model will learn spatiotemporal features directly in end-to-end data training tests to produce a robust model. The test data used are 10 sports activities obtained from related research from the University of Central Florida (UCF). The results showed that the performance was quite good, although there were still errors in the classification of sports activities because they had similarities in the movements of the activities carried out. The classification results show a loss value of 0.4 and an accuracy of 0.94. In further research, what needs to be corrected is the loss value which is still high so that several times the test results show an error in the classification of sports activities that have similarities in the movements of the activities.


Human Activity Recognition (HAR); Classification; Convolutional Neural Network; Long Short-Term Memory

Full Text:



Żelawski, Marcin and Hachaj, Tomasz. "The application of topological data analysis to human motion recognition" International Journal Technical Transactions, vol.118, no.1, 2021, pp.-.

Z. Zhang, Z. Lv, C. Gan, and Q. Zhu, "Human action recognition using convolutional LSTM and fully-connected LSTM with different attentions," International Journal Neurocomputing, vol.410, pp.304–316, 2020, doi: 10.1016/j.neucom.2020.06.032.

Mokari, M., Mohammadzade, H., & Ghojogh, B. (2020). Recognizing involuntary actions from 3D skeleton data using body states. International Journal Scientia Iranica, 27(3), 1424-1436. doi: 10.24200/sci.2018.20446

K. Muhammad et al., "Human action recognition using attention based LSTM network with dilated CNN features," International Journal Future General Computing System, vol. 125, pp. 820–830, 2021, doi: 10.1016/j.future.2021.06.045.

J. Donahue et al., "Long-term Recurrent Convolutional Networks for Visual Recognition and Description," pp. 1–14, 2016.

S. U. Park, J. H. Park, M. A. Al-Masni, M. A.Al-Antari, M. Z. Uddin, and T. S. Kim, "A Depth Camera-based Human Activity Recognition via Deep Learning Recurrent Neural Network for Health and Social Care Services," Procedia Computing Science., vol. 100, pp. 78–84, 2016, doi: 10.1016/j.procs.2016.09.126.

S. Arif, J. Wang, T. Ul Hassan, and Z. Fei, "3D-CNN-based fused feature maps with LSTM applied to action recognition," Journal Future Internet, vol.11, no. 2, 2019, doi: 10.3390/fi11020042.

N. Surayahani, M. Norzali, and M. Razali, "Human Activity Recognition Based on Convolutional Neural Network," Journal International Science Technology., vol. 2018-Augus, pp. 48–57, 2018, doi: 10.1109/ICPR.2018.8545435.

S. Deep and X. Zheng, "Leveraging CNN and Transfer Learning for Vision-based Human Activity Recognition," 2019 29th International Telecommunication Networks Application Conference ITNAC 2019, pp.35–38, 2019, doi: 10.1109/ITNAC46935.2019.9078016.

Y. Zhao, K. L. Man, J. Smith, K. Siddique, and S. U. Guan, "Improved two-stream model for human action recognition," Eurasip Journal Image Video Process, vol. 2020, no. 1, 2020, doi: 10.1186/s13640-020-00501-x.

W. Xu, Y. Pang, Y. Yang, and Y. Liu, "Human Activity Recognition Based On Convolutional Neural Network," in 2018 24th International Conference on Pattern Recognition (ICPR), Aug. 2018, vol. 11742 LNAI, pp. 165–170, doi: 10.1109/ICPR.2018.8545435.

R. Mutegeki and D. S. Han, "A CNN-LSTM Approach to Human Activity Recognition," 2020 International Conference Artificial Intelligent Information Communication. ICAIIC 2020, pp. 362–366, 2020, doi: 10.1109/ICAIIC48513.2020.9065078.

Y.-C. Liu, J.-J.Ding, Y.-J. Chang, C.-Y. Wang, and J.-C. Wang, "Action recognition using three dimension convolution and long short term memory," in 2017 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW), Jun. 2017, pp. 83–84, doi: 10.1109/ICCE- China.2017.7991006.

Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M. A., Al-Amidie, M., & Farhan, L. "Review of deep learning: concepts, CNN architectures, challenges, applications, future directions", Journal of Big Data (Vol. 8, Issue 1). Springer International Publishing, 2021,

Batta, M., "Machine Learning Algorithms - A Review", International Journal of Science and Research (IJ, 9(1), 381-undefined, 2020,

Caron, M., Bojanowski, P., Joulin, A., & Douze, M., "Deep Clustering for Unsupervised Learning of Visual Features", 2019,

Nima, R., & Shila, F., "Crack classification in rotor-bearing system by means of wavelet transform and deep learning methods: an experimental investigation", Journal of Mechanical Engineering, Automation and Control Systems, 1(2), 102–113, 2020

Rebala, G., A, R., & S, C., "Machine Learning Definition and Basics", Springer, Cham, 2019,

Wildan, M., Aldi, P., & Aditsania, A., “Analisis dan Implementasi Long Short Term Memory Neural Network untuk Prediksi Harga Bitcoinâ€, E-Proceeding of Engineering, 5(2), 3548–3555, 2018,

Reddy, K. K., & Shah, M., 'Recognizing 50 human action categories of web videos", Journal Machine Vision and Applications, 24(5), 971–981, 2013,

Ghosh, A., Sufian, A., Sultana, F., Chakrabarti, A., & De, D. "Fundamental concepts of convolutional neural network", Journal Intelligent Systems Reference Library (Vol. 172, Issue January), 2019,

François-lavet, V., Henderson, P., Islam, R., Bellemare, M. G., François-lavet, V., Pineau, J., & Bellemare, M. G. “An Introduction to Deep Reinforcement Learningâ€, Foundations and Trends in Machine Learning, II(3–4), 1–140, 2018,

Firmansyah, R., “Implementasi Deep Learning Menggunakan Convolutional Neural Network Untuk Klasifikasi Bungaâ€, Fakultas Sains Dan Teknologi UIN Syarif Hidayatullah Jakarta, 2020,

Apaydin, H., Feizi, H., Sattari, M. T., & Colak, M. S., "Comparative Analysis of Recurrent Neural Network", Water (Switzerland), 12, 1–18, 2020,

Hochreiter, S., & Schmidhuber, J., “Long Short-Term Memoryâ€. Journal Neural Computation, 9(8), 1735–1780, 1997,

Bhaskar, D., Manhart, A, Milzman, J, Nardini, J. T, Storey, K. M., Topaz, C. M., & Ziegelmeier, L. (2019). Analyzing collective motion with machine learning and topology. Chaos: An Interdisciplinary Journal of Nonlinear Science, 29(12), 123–125.

Ko, J. H., Han, D. W., & Newell, K. M. (2018). Skill level changes the coordination and variability of standing posture and movement in a pistol-aiming task. Journal of Sports Sciences, 36(7), 809–816.

Alwin Poulose, Jung Hwan Kim, Dong Seog Han, "HIT HAR: Human Image Threshing Machine for Human Activity Recognition Using Deep Learning Models", Computational Intelligence and Neuroscience, vol. 2022, Article ID 1808990, 21 pages, 2022.

M. Ronald, A. Poulose, and D. S. Han, "iSPLInception: an inception-ResNet deep learning architecture for human activity recognition," IEEE Access, vol. 9, pp. 68985–69001, 2021.

W. Wang, A. X. Liu, M. Shahzad, K. Ling, and S. Lu, "Device-free human activity recognition using commercial WiFi devices," IEEE Journal on Selected Areas in Communications, vol. 35, no. 5, pp. 1118–1131, 2017.

F. Wang, W. Gong, and J. Liu, "On spatial diversity in WiFi-based human activity recognition: a deep learning-based approach," IEEE Internet of Things Journal, vol. 6, no. 2, pp. 2035–2047, 2019.

Y. Wang, J. Wu, and H. Li, "Human detection based on improved mask R-CNN," Journal of Physics: Conference Series, vol. 1575, no. 1, Article ID 012067, 2020.