Real-time Estimation of Road Surfaces using Fast Monocular Depth Estimation and Normal Vector Clustering

Chuho Yi - Department of Software Convergence, Hanyang Women's University, Seoul 04763, South Korea
Jungwon Cho - Department of Computer Education, Jeju National University, Jeju 63243, South Korea

Citation Format:



Estimating a road surface or planes for applying AR(Augmented Reality) or an autonomous vehicle using a camera requires significant computation. Vision sensors have lower accuracy in distance measurement than other types of sensor, and have the difficulty that additional algorithms for estimating data must be included. However, using a camera has the advantage of being able to extract various information such as weather conditions, sign information, and road markings that are difficult to measure with other sensors. Various methods differing in sensor type and configuration have been applied. Many of the existing studies had generally researched by performing the depth estimation after the feature extraction. However, recent studies have suggested using deep learning to skip multiple processes and use a single DNN(Deep Neural Network). Also, a method using a limited single camera instead of a method using a plurality of sensors has been proposed. This paper presents a single-camera method that performs quickly and efficiently by employing a DNN to extract distance information using a single camera, and proposes a modified method for using a depth map to obtain real-time surface characteristics. First, a DNN is used to estimate the depth map, and then for quick operation, normal vector that can connect similar planes to depth is calculated, and a clustering method that can be connected is provided. An experiment is used to show the validity of our method, and to evaluate the calculation time.


Real-time estimation; deep neural network; road surfaces; fast monocular depth estimation; normal vector clustering.

Full Text:



B. Jähne and H. Horst, Computer vision and applications, pp. 111-151, 2000.

J. Brownlee, Deep learning for computer vision: image classification, object detection, and face recognition in python, 2019.

M. Poggi, F. Aleotti, F. Tosi, and S. Mattoccia, Towards real-time unsupervised monocular depth estimation on cpu, In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5848-5854, Oct. 2018.

K. Y. Lin and H. M. Hang, Depth map enhancement on rgb-d video captured by kinect v2, In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1530-1535, Nov. 2018.

Y. W. Kuan, N. O. Ee, and L. S. Wei, Comparative study of intel R200, Kinect v2, and primesense RGB-D sensors performance outdoors, IEEE Sensors Journal, 19(19), pp. 8741-8750, 2019.

D. Pohl, S. Dorodnicov, and M. Achtelik, Depth map improvements for stereo-based depth cameras on drones, In 2019 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 341-348, Sep. 2019.

G. Yang, X. Song, C. Huang, Z. Deng, J. Shi, and B. Zhou, Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios, In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 899-908, 2019.

O. Ozyesil, V. Voroninski, R. Basri, and A. Singer, A survey of structure from motion, arXiv preprint arXiv:1701.08493, 2017.

M. R. U. Saputra, A. Markham, and N. Trigoni, Visual SLAM and structure from motion in dynamic environments: A survey, ACM Computing Surveys (CSUR), 51(2), pp. 1-36, 2018.

A. Shalaby, M. Elmogy, and A. A. El-Fetouh, Algorithms and applications of structure from motion (SFM): A survey, Algorithms, 6(06), 2017.

H. Zhan, R. Garg, C. S. Weerasekera, K. Li, H. Agarwal, and I. Reid, Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 340-349, 2018.

J. W. Bian, Z. Li, N. Wang, H. Zhan, C. Shen, M. M. Cheng, and I. Reid, Unsupervised scale-consistent depth and ego-motion learning from monocular video, arXiv preprint arXiv:1908.10553, 2019.

C. Godard, O. M. Aodha, M. Firman, and G. J. Brostow, Digging into self-supervised monocular depth estimation, In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828-3838, 2019.

A. Ardakani, C. Condo, M. Ahmadi, and W. J. Gross, An architecture to accelerate convolution in deep neural networks, IEEE Transactions on Circuits and Systems I: Regular Papers, 65(4), 1349-1362, 2017.

S. Karen and A. Zisserman, Deep Convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014.

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, Vision meets robotics: The kitti dataset, The International Journal of Robotics Research, 32(11), pp. 1231-1237, 2013.

A. Geiger, P. Lenz, and R. Urtasun, Are we ready for autonomous driving? the kitti vision benchmark suite, In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354-3361, June 2012.

M. Poggi, PyDnet, 2018. [online]. Available:

D. Holz, S. Holzer, R. B. Rusu, and S. Behnke, Real-time plane segmentation using RGB-D cameras, In Robot Soccer World Cup, pp. 306-317, July 2011.

J. Oh, K. S. Kim, M. Park, and S. Kim, A comparative study on camera-radar calibration methods, In 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV), pp. 1057-1062, Nov. 2018.

Z. Zhang, R. Zhao, E. Liu, K. Yan, and Y. Ma, A single-image linear calibration method for camera, Measurement, 130, pp. 298-305, 2018.

O. Bouafif, B. Khomutenko, and M. Daoudi, Monocular 3D Head Reconstruction via Prediction and Integration of Normal Vector Field, In 15th International Conference on Computer Vision, Theory and Applications, Feb. 2020.