An Experimental Study on Deep Learning Technique Implemented on Low Specification OpenMV Cam H7 Device

Rosa Asmara - State Polytechnic of Malang, Jl. Soekarno Hatta No.09, Malang, 65141, Indonesia
Ulla Rosiani - State Polytechnic of Malang, Jl. Soekarno Hatta No.09, Malang, 65141, Indonesia
Mustika Mentari - State Polytechnic of Malang, Jl. Soekarno Hatta No.09, Malang, 65141, Indonesia
Arie Syulistyo - State Polytechnic of Malang, Jl. Soekarno Hatta No.09, Malang, 65141, Indonesia
Milyun Shoumi - State Polytechnic of Malang, Jl. Soekarno Hatta No.09, Malang, 65141, Indonesia
Mungki Astiningrum - State Polytechnic of Malang, Jl. Soekarno Hatta No.09, Malang, 65141, Indonesia


Citation Format:



DOI: http://dx.doi.org/10.62527/joiv.8.2.2299

Abstract


This research aims to identify and recognize the OpenMV Camera H7. In this research, all tests were carried out using Deep Machine Learning and applied to several functions, including Face Recognition, Facial Expression Recognition, Detection and Calculation of the Number of Objects, and Object Depth Estimation. Face Expression Recognition was used in the Convolutional Neural Network to recognize five facial expressions: angry, happy, neutral, sad, and surprised. This allowed the use of a primary dataset with a 48MP resolution camera. Some scenarios are prepared to meet environment variability in the implementation, such as indoor and outdoor environments, with different lighting and distance. Most pre-trained models in each identification or recognition used mobileNetV2 since this model allows low computation cost and matches with low hardware specifications. The object detection and counting module compared two methods: the conventional Haar Cascade and the Deep Learning MobileNetV2 model. The training and validation process is not recommended to be carried out on OpenMV devices but on computers with high specifications. This research was trained and validated using selected primary and secondary data, with 1500 image data. The computing time required is around 5 minutes for ten epochs. On average, recognition results on OpenMV devices take around 0.3 - 2 seconds for each frame. The accuracy of the recognition results varies depending on the pre-trained model and the dataset used, but overall, the accuracy levels achieved tend to be very high, exceeding 96.6%.


Keywords


Deep learning; face recognition; face expression recognition; depth estimation; object detection; object counting; CNN

Full Text:

PDF

References


I. Abdelkader, Y. El-Sonbaty, and M. El-Habrouk, “Openmv: A Python powered, extensible machine vision camera.” 2017.

A. Aqthobilrobbany, A. N. Handayani, D. Lestari, Muladi, R. A. Asmara, and O. Fukuda, “HSV Based Robot Boat Navigation System,” in 2020 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM), 2020, pp. 269–273. doi: 10.1109/CENIM51130.2020.9297915.

R. A. Asmara, B. Syahputro, D. Supriyanto, and A. N. Handayani, “Prediction of traffic density using yolo object detection and implemented in raspberry pi 3b + and intel ncs 2,” 4th International Conference on Vocational Education and Training, ICOVET 2020, pp. 391–395, Sep. 2020, doi: 10.1109/ICOVET50258.2020.9230145.

A. N. Handayani, D. Lestari, Muladi, W. Ahmad, R. A. Asmara, and O. Fukuda, “Cognitive Function Tools/Robot Design for Elderly Using Image-Processing,” in 2022 International Conference on Electrical and Information Technology (IEIT), 2022, pp. 286–289. doi: 10.1109/IEIT56384.2022.9967856.

R. A. Asmara et al., “Face Recognition Using ArcFace and FaceNet in Google Cloud Platform For Attendance System Mobile Application,” in Proceedings of the 2022 Annual Technology, Applied Science and Engineering Conference (ATASEC 2022), Atlantis Press, 2022, pp. 134–144. doi: 10.2991/978-94-6463-106-7_13.

R. A. Asmara, I. Siradjuddin, and R. D. Romadhona, “Android based Wireless Sensor Network (WSN) mobile application on humidity and temperature environmental monitor using CC2650 sensor tag,” IOP Conf Ser Mater Sci Eng, vol. 1073, no. 1, p. 12046, Feb. 2021, doi: 10.1088/1757-899X/1073/1/012046.

Rafi Hanif Rahmadhani, Imam Fahrur Rozi, and Rosa Andrie Asmara, “Live K-Means Clustering Pada Wireless Sensor Network Menggunakan Google Maps API,” in Seminar Informatika Aplikatif Polinema (SIAP) –2021, M. A. Hendrawan, H. E. Dien, S. E. Sukmana, and M. Z. Abdullah, Eds., Malang: Information Technology Department, State Polytechnic of Malang, Nov. 2021, pp. 46–52.

O.-A. Schipor and A. Aiordăchioae, “Engineering Details of a Smartglasses Application for Users with Visual Impairments,” in 2020 International Conference on Development and Application Systems (DAS), 2020, pp. 157–161. doi: 10.1109/DAS49615.2020.9108920.

B. Niu, Z. Gao, and B. Guo, “Facial Expression Recognition with LBP and ORB Features,” Comput Intell Neurosci, vol. 2021, p. 8828245, 2021, doi: 10.1155/2021/8828245.

S.-H. Sung et al., “A Study on Facial Expression Change Detection Using Machine Learning Methods with Feature Selection Technique,” Mathematics, vol. 9, no. 17, 2021, doi: 10.3390/math9172062.

I. M. Revina and W. R. S. Emmanuel, “A Survey on Human Face Expression Recognition Techniques,” Journal of King Saud University - Computer and Information Sciences, vol. 33, no. 6, pp. 619–628, 2021, doi: https://doi.org/10.1016/j.jksuci.2018.09.002.

M. Mentari, R. Andrie Asmara, K. Arai, and H. Sakti Oktafiansyah, “Detecting Objects Using Haar Cascade for Human Counting Implemented in OpenMV,” Register, vol. 9, no. 2, pp. 122–133, Sep. 2023, doi: 10.26594/register.v9i2.3175.

A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” Apr. 2017.

A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” pp. 32–33, 2009, [Online]. Available: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf

M. I. U. Haque and D. Valles, “A Facial Expression Recognition Approach Using DCNN for Autistic Children to Identify Emotions,” in 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), 2018, pp. 546–551. doi: 10.1109/IEMCON.2018.8614802.

M. A. H. Akhand, S. Roy, N. Siddique, M. A. S. Kamal, and T. Shimamura, “Facial Emotion Recognition Using Transfer Learning in the Deep CNN,” Electronics (Basel), vol. 10, no. 9, 2021, doi: 10.3390/electronics10091036.

E. Pranav, S. Kamal, C. Satheesh Chandran, and M. H. Supriya, “Facial Emotion Recognition Using Deep Convolutional Neural Network,” in 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), 2020, pp. 317–320. doi: 10.1109/ICACCS48705.2020.9074302.

W. H. Abdulsalam, R. S. Alhamdani, and M. N. Abdullah, “Facial emotion recognition from videos using deep convolutional neural networks,” Int J Mach Learn Comput, vol. 9, no. 1, pp. 14–19, Feb. 2019, doi: 10.18178/IJMLC.2019.9.1.759.

P. Mohan, A. J. Paul, and A. Chirania, “A Tiny CNN Architecture for Medical Face Mask Detection for Resource-Constrained Endpoints,” Lecture Notes in Electrical Engineering, vol. 756 LNEE, pp. 657–670, Nov. 2020, doi: 10.1007/978-981-16-0749-3_52.

S. Kakarla, P. Gangula, M. S. Rahul, C. S. C. Singh, and T. H. Sarma, “Smart Attendance Management System Based on Face Recognition Using CNN,” in 2020 IEEE-HYDCON, 2020, pp. 1–5. doi: 10.1109/HYDCON48903.2020.9242847.

S. S. Rajput and K. V Arya, “CNN Classifier based Low-resolution Face Recognition Algorithm,” in 2020 International Conference on Emerging Frontiers in Electrical and Electronic Technologies (ICEFEET), 2020, pp. 1–4. doi: 10.1109/ICEFEET49149.2020.9187001.

M. Afif, R. Ayachi, E. Pissaloux, Y. Said, and M. Atri, “Indoor objects detection and recognition for an ICT mobility assistance of visually impaired people,” Multimed Tools Appl, vol. 79, no. 41, pp. 31645–31662, 2020, doi: 10.1007/s11042-020-09662-3.

M. H. Romario, E. Ihsanto, and T. M. Kadarina, “Sistem Hitung dan Klasifikasi Objek dengan Metode Convolutional Neural Network,” Jurnal Teknologi Elektro, vol. 11, no. 2, pp. 108–114, Jun. 2020, doi: 10.22441/JTE.2020.V11I2.007.

S. Aich and I. Stavness, “Global Sum Pooling: A Generalization Trick for Object Counting with Small Datasets of Large Images,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2019-June, pp. 73–82, May 2018, Accessed: May 18, 2023. [Online]. Available: https://arxiv.org/abs/1805.11123v2

X. Tan, C. Tao, T. Ren, J. Tang, and G. Wu, “Crowd Counting via Multi-Layer Regression,” in Proceedings of the 27th ACM International Conference on Multimedia, in MM ’19. New York, NY, USA: Association for Computing Machinery, 2019, pp. 1907–1915. doi: 10.1145/3343031.3350914.

X. Chen, Y. Bin, N. Sang, and C. Gao, “Scale Pyramid Network for Crowd Counting,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1941–1950. doi: 10.1109/WACV.2019.00211.

W. Kong, H. Li, X. Zhang, and G. Zhao, “A multi-context representation approach with multi-task learning for object counting,” Knowl Based Syst, vol. 197, p. 105927, 2020, doi: https://doi.org/10.1016/j.knosys.2020.105927.

K. Dijkstra, J. van de Loosdrecht, W. A. Atsma, L. R. B. Schomaker, and M. A. Wiering, “CentroidNetV2: A hybrid deep neural network for small-object segmentation and counting,” Neurocomputing, vol. 423, pp. 490–505, 2021, doi: https://doi.org/10.1016/j.neucom.2020.10.075.

H. Li, W. Kong, and S. Zhang, “Deeply scale aggregation network for object counting,” Knowl Based Syst, vol. 210, p. 106485, 2020, doi: https://doi.org/10.1016/j.knosys.2020.106485.

K. Swaraja et al., “CNN Based Monocular Depth Estimation,” E3S Web of Conferences, vol. 309, p. 01070, Oct. 2021, doi: 10.1051/E3SCONF/202130901070.

S. J. Lee, H. Choi, and S. S. Hwang, “Real-time Depth Estimation Using Recurrent CNN with Sparse Depth Cues for SLAM System,” Int J Control Autom Syst, vol. 18, no. 1, pp. 206–216, Jan. 2020, doi: 10.1007/S12555-019-0350-8/METRICS.

Y. Sada, N. Soga, M. Shimoda, A. Jinguji, S. Sato, and H. Nakahara, “Fast Monocular Depth Estimation on an FPGA,” in 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2020, pp. 143–146. doi: 10.1109/IPDPSW50202.2020.00032.

I. A. Kaust and P. Wonka, “High Quality Monocular Depth Estimation via Transfer Learning,” Dec. 2018, Accessed: May 18, 2023. [Online]. Available: https://arxiv.org/abs/1812.11941v2

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, Jan. 2018, doi: 10.1109/CVPR.2018.00474.

P. Kumar, | Ananda, S. Hati, and A. S. Hati, “Convolutional neural network with batch normalisation for fault detection in squirrel cage induction motor,” IET Electr Power Appl, vol. 15, no. 1, pp. 39–50, Jan. 2021, doi: 10.1049/ELP2.12005.

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 2261–2269, Aug. 2016, doi: 10.1109/CVPR.2017.243.

I. Kouretas and V. Paliouras, “Simplified Hardware Implementation of the Softmax Activation Function,” 2019 8th International Conference on Modern Circuits and Systems Technologies, MOCAST 2019, May 2019, doi: 10.1109/MOCAST.2019.8741677.

Q. Zhang, Y. Liu, C. Gong, Y. Chen, and H. Yu, “Applications of Deep Learning for Dense Scenes Analysis in Agriculture: A Review,” Sensors, vol. 20, no. 5, 2020, doi: 10.3390/s20051520.