Exploration of The Impact of Kernel Size for YOLOv5-based Object Detection on Quadcopter

Rissa Rahmania - Bina Nusantara University, Bandung Campus, Jakarta, Indonesia
Felix Corputty - Telkom University, Bandung, Indonesia
Suryo Wibowo - Telkom University, Bandung, Indonesia
Dany Saputra - Bina Nusantara University, Bandung Campus, Jakarta, Indonesia
Annisa Istiqomah - Bina Nusantara University, Bandung Campus, Jakarta, Indonesia

Citation Format:

DOI: http://dx.doi.org/10.30630/joiv.6.3.898


Drones or quadcopters have been widely used in various fields based on deep learning, especially object detection. However, drone vision characteristics such as occlusion and small objects are still being explored for performance in terms of accuracy and speed detection. The YOLO architecture is very commonly used for cases requiring high-speed detection. To overcome the limitations of drone vision, in this paper, we explore the size of the YOLOv5s backbone kernel in the shallowest convolutional layer to achieve better performance. The kernel is a filter that has a main role in the feature map, and it defines the size of the convolution matrix, and the resulting features in the shallowest convolutional layer are more representative of the case of object detection and recognition. The techniques can be divided into three major categories: (1) data preprocessing, which involves augmentation and normalization of the data, (2) kernel size exploration in the shallowest convolutional layer of the YOLOv5s, and (3) model implementation in the real environment using the quadcopter. The dataset consisted of four classes representing dragon fruit, snake fruit, banana, and pineapple, with a total of 8000 data. Exploration results with kernel size give promising results. Kernel sizes 5 and 7 give an mAP of 0.988. Through these results, modification of the kernel size provides an opportunity for more in-depth investigations, such as with the epoch parameter, padding scheme, and other optimization techniques.


YOLOv5; object detection; kernel size; quadcopter; deep learning.

Full Text:



R. Shrestha, R. Bajracharya, and S. Kim, “6G Enabled Unmanned Aerial Vehicle Traffic Management: A Perspective,†IEEE Access, vol. 9, pp. 91119–91136, 2021, doi: 10.1109/ACCESS.2021.3092039.

J. Kim, S. Kim, C. Ju, and H. il Son, “Unmanned aerial vehicles in agriculture: A review of perspective of platform, control, and applications,†IEEE Access, vol. 7. Institute of Electrical and Electronics Engineers Inc., pp. 105100–105115, 2019. doi: 10.1109/ACCESS.2019.2932119.

Z. Liu, C. Liu, W. Zhao, and A. Li, “A User-Priority-Driven Multi-UAV Cooperative Reconnaissance Strategy,†International Journal of Aerospace Engineering, vol. 2021, pp. 1–14, Oct. 2021, doi: 10.1155/2021/9504056.

S. K. Niranjan, REVA University, Institute of Electrical and Electronics Engineers. Bangalore Section, and Institute of Electrical and Electronics Engineers, Proceedings of the International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE 2020) : October 9-10, 2020, Virtual Conference.

W. Jiang, Y. Zhou, L. Ding, C. Zhou, and X. Ning, “UAV-based 3D reconstruction for hoist site mapping and layout planning in petrochemical construction,†Automation in Construction, vol. 113, May 2020, doi: 10.1016/j.autcon.2020.103137.

S. A. Wibowo, H. Lee, E. K. Kim, and S. Kim, “Collaborative Learning based on Convolutional Features and Correlation Filter for Visual Tracking,†International Journal of Control, Automation and Systems, vol. 16, no. 1, pp. 335–349, Feb. 2018, doi: 10.1007/s12555-017-0062-x.

S. A. Wibowo, H. Lee, E. K. Kim, and S. Kim, “Visual tracking based on complementary learners with distractor handling,†Mathematical Problems in Engineering, vol. 2017, 2017, doi: 10.1155/2017/5295601.

S. A. Wibowo, H. Lee, E. K. Kim, and S. Kim, “Convolutional Shallow Features for Performance Improvement of Histogram of Oriented Gradients in Visual Object Tracking,†Mathematical Problems in Engineering, vol. 2017, 2017, doi: 10.1155/2017/6329864.

M. Liu, X. Wang, A. Zhou, X. Fu, Y. Ma, and C. Piao, “Uav-yolo: Small object detection on unmanned aerial vehicle perspective,†Sensors (Switzerland), vol. 20, no. 8, Apr. 2020, doi: 10.3390/s20082238.

X. Zhang, E. Izquierdo, and K. Chandramouli, “Dense and Small Object Detection in UAV Vision based on Cascade Network.†[Online]. Available: http://www.goldmansachs.com/our-thinking/technology-driving-

Z. Pi, Y. Lian, X. Chen, Y. Wu, Y. Li, and L. Jiao, “A Novel Spatial and Temporal Context-Aware Approach for Drone-Based Video Object Detection,†2020. doi: 10.1109/ICCVW.2019.00027.

K. M. Abughalieh, B. H. Sababha, and N. A. Rawashdeh, “A video-based object detection and tracking system for weight sensitive UAVs,†Multimedia Tools and Applications, vol. 78, no. 7, pp. 9149–9167, Apr. 2019, doi: 10.1007/s11042-018-6508-1.

P. Zhang, Y. Zhong, and X. Li, “SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications.†[Online]. Available: https://github.com/PengyiZhang/SlimYOLOv3.

Q. Wu and Y. Zhou, “Real-Time Object Detection Based on Unmanned Aerial Vehicle,†2019. doi: 10.1109/DDCLS.2019.8908984.

J. Zhang, X. Liang, M. Wang, L. Yang, and L. Zhuo, “Coarse-to-fine object detection in unmanned aerial vehicle imagery using lightweight convolutional neural network and deep motion saliency,†Neurocomputing, vol. 398, pp. 555–565, Jul. 2020, doi: 10.1016/j.neucom.2019.03.102.

H. C. Baykara, E. Biyik, G. Gul, D. Onural, and A. S. Ozturk, “Real-time detection, tracking and classification of multiple moving objects in uav videos,†in Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI, Jun. 2018, vol. 2017-November, pp. 945–950. doi: 10.1109/ICTAI.2017.00145.

J. Lee, J. Wang, D. Crandall, S. Sabanovic, and G. Fox, “Real-time, cloud-based object detection for unmanned aerial vehicles,†in Proceedings - 2017 1st IEEE International Conference on Robotic Computing, IRC 2017, May 2017, pp. 36–43. doi: 10.1109/IRC.2017.77.

A. Wiranata, S. A. Wibowo, R. Patmasari, R. Rahmania, and R. Mayasari, “Investigation of Padding Schemes for Faster R-CNN on Vehicle Detection,†2018. doi: 10.1109/ICCEREC.2018.8712086.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,†Jun. 2015, [Online]. Available: http://arxiv.org/abs/1506.02640

A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,†Apr. 2020, [Online]. Available: http://arxiv.org/abs/2004.10934

B. Custers Editor, “The Future of Drone Use Opportunities and Threats from Ethical and Legal Perspectives.†[Online]. Available: http://www.springer.com/series/8857

H. Takano et al., “Visible Light Communication on LED-equipped Drone and Object-Detecting Camera for Post-Disaster Monitoring,†in IEEE Vehicular Technology Conference, Apr. 2021, vol. 2021-April. doi: 10.1109/VTC2021-Spring51267.2021.9448902.

S. Hossain and D. J. Lee, “Deep learning-based real-time multiple-object detection and tracking from aerial imagery via a flying robot with GPU-based embedded devices,†Sensors (Switzerland), vol. 19, no. 15, Aug. 2019, doi: 10.3390/s19153371.

D. Du et al., “VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results.†[Online]. Available: http://www.aiskyeye.com/.

M. Mandal, L. K. Kumar, and S. K. Vipparthi, “MOR-UAV: A Benchmark Dataset and Baselines for Moving Object Recognition in UAV Videos,†in MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia, Oct. 2020, pp. 2626–2635. doi: 10.1145/3394171.3413934.

Z. Q. Zhao, P. Zheng, S. T. Xu, and X. Wu, “Object Detection with Deep Learning: A Review,†IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11. Institute of Electrical and Electronics Engineers Inc., pp. 3212–3232, Nov. 01, 2019. doi: 10.1109/TNNLS.2018.2876865.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,†Nov. 2013, [Online]. Available: http://arxiv.org/abs/1311.2524

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,†Jun. 2015, [Online]. Available: http://arxiv.org/abs/1506.01497

L. Jiao et al., “A Survey of Deep Learning-based Object Detection,†Jul. 2019, doi: 10.1109/ACCESS.2019.2939201.

W. Liu et al., “SSD: Single Shot MultiBox Detector,†Dec. 2015, doi: 10.1007/978-3-319-46448-0_2.

A. A. Abdelhamid, S. R. Alotaibi, and A. Mousa, “Deep learning-based prototyping of android gui from hand-drawn mockups,†IET Software, vol. 14, no. 7, pp. 816–824, Dec. 2020, doi: 10.1049/iet-sen.2019.0378.

G. Jocher, “YOLOv5,†2020. https://github.com/ultralytics/yolov5 (accessed Jul. 08, 2022).

C.-Y. Wang, H.-Y. M. Liao, I.-H. Yeh, Y.-H. Wu, P.-Y. Chen, and J.-W. Hsieh, “CSPNet: A New Backbone that can Enhance Learning Capability of CNN,†Nov. 2019, [Online]. Available: http://arxiv.org/abs/1911.11929

K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,†Jun. 2014, doi: 10.1007/978-3-319-10578-9_23.

T.-Y. Lin et al., “Microsoft COCO: Common Objects in Context,†May 2014, [Online]. Available: http://arxiv.org/abs/1405.0312

J. Hosang, R. Benenson, and B. Schiele, “Learning non-maximum suppression,†2017. Accessed: Jul. 08, 2022. [Online]. Available: https://arxiv.org/abs/1705.02950

L. Alzubaidi et al., “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions,†Journal of Big Data, vol. 8, no. 1, Dec. 2021, doi: 10.1186/s40537-021-00444-8.