Monocular visual 3D cuboid measurement method based on Mask-RCNN and SFM

SONG Le1,2, HOU Yupeng1,2, ZHANG Junpeng1, WU Tong1, QI Haoming1, SHANG Enhao1

（1. State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China；2. Laboratory of Micro/Nano Manufacturing Technology, Tianjin University, Tianjin 300072, China）

Abstract： To solve the limitation of multi-view in motion-based reconstruction method and realize automatic measurement in three dimensions, a monocular vision method based on Mask-region convolutional neural networks (Mask-RCNN) and structure from motion (SFM) is proposed for the 3D cuboid measurement. Taking a box as example, the method mainly includes measurement point extraction, transformation matrix calculation and 3D mapping measurement. Only one calibration is required, and 3D measurement using deep learning techniques and single-view images can be automatically realized. The method avoids complex reconstruction and reduces the application requirement of visual measurement methods. Experimental results show that the uncertainties of relative standard obtained by this method under checkerboard and specific markers are less than 6% and less than 8%, respectively.

Key words： deep learning； Mask-region convolutional neural networks (Mask-RCNN)； monocular vision； structure from motion (SFM)； 3D measurement

References

［1］GAO R X, WANG J M. Volume measurement of coal based on binocular stereo vision. Computer Systems & Applications, 2014, 23(5): 126-133.

［2］MAO J H, LOU X P, LI W X, et al. Binocular 3D volume measurement system based on line-structured light. Optical Technique, 2016, 42(1): 10-15.

［3］LI G H. Digital measurement and reverse design of the complex box. Luoyang: He’nan University of Science and Technology, 2017.

［4］TONG S, XU X G, YI C T, et al. Overview on vision-based 3D reconstruction. Computer Application Research, 2011, 28(7): 2411-2417.

［5］WU Y, SONG Y, ZHOU H. State identification of boiler combustion flame images based on gray entropy multiple thresholding and support vector machine. Proceedings of the CSEE, 2013, 33(20): 66-73.

［6］WESTOBY M J, BRASINGTON J, GLASSER N F, et al. Structure-from-Motion photogrammetry: a low-cost, effective tool for geoscience applications. Geomorphology, 2012, 179: 300-314.

［7］LIU C, WANG Y H, ZHANG X, et al. Monocular measurement of box dimensions from its projective contour. Journal of Shenyang Ligong University, 2018, 37(5): 31-37.

［8］HAN S J. Research on intelligent visual measurement technology. Beijing: North China University of Technology, 2021.

［9］VIJAYANARASIMHAN S, RICCO S, SCHMID C, et al. SfM-Net: learning of structure and motion from video. Computer Vision and Pattern Recognition, 2017, arXiv: 1704.07804.

［10］TANG C, TAN P. BA-Net: dense bundle adjustment network. Computer Vision and Pattern Recognition, 2019, arXiv: 1806.04807.

［11］DAVID N, NIKHILA R, BEN G, et al. C3DPO: canonical 3D pose networks for non-rigid structure from motion//2019 IEEE International Conference on Computer Vision (ICCV), Oct. 27-Nov. 2, 2019, Seoul, Korea (South). New York: IEEE, 2019: 7688-7697.

［12］CHENG G, YANG C, YAO X, et al. When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative CNNs. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(5): 2811-2821.

［13］LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context. Computer Vision and Pattern Recognition, 2014, arXiv: 1405.0312.

［14］LIN T Y, DOLLAR P, GIRSHICK R,et al. Feature pyramid networks for object detection//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 2016: 936-944.

［15］YIN S S. SFM algorithm for multi-lens combined panoramic camera. Wuhan: Wuhan University, 2020.

［16］WANG B Q. Research on 3D point cloud reconstruction algorithm based on Monocular vision and multi-view geometry. Wuhan: Huazhong University of Science and Technology, 2019.

［17］ZHANG L. The algorithm of multi-exposure image fusion based on detail enhancement and ghosting removal. Chongqing: Chongqing University of Posts and Telecommunications, 2020.

基于Mask-RCNN与SFM的单目视觉长方体三维测量方法

宋乐1, 2，侯宇鹏1, 2 ，张俊鹏1，吴桐1，齐昊鸣1，商恩浩1

（1. 天津大学精密测试技术及仪器国家重点实验室，天津 300072；2. 天津大学微纳制造实验室，天津 300072）

摘要：为解决基于运动结构恢复（Structure from motion, SFM）多视角拍摄的局限性，以实现自动化三维测量效果，本文提出了一种可用于长方体三维测量的基于Mask-区域卷积神经网络 (Mask-region convolutional neural networks, Mask-RCNN)和SFM的单目视觉测量方法。以箱体三维测量为例，该方法包括测量点提取、转换矩阵计算和三维映射测量三个部分，仅需一次标定获取内部参数，利用深度学习技术实现了单视角自动化三维测量，避免复杂重建的同时降低了视觉测量方法的应用要求。实验结果表明，该方法在棋盘格标志物下获得测量结果的相对标准不确定度在6%以内，在箱体自带标志物下获得测量结果的相对标准不确定度在8%以内。

关键词：深度学习； Mask-区域卷积神经网络；单目视觉；运动结构恢复；三维测量

引用格式：SONG Le, HOU Yupeng, ZHANG Junpeng, et al． Monocular visual 3D cuboid measurement method based on Mask-RCNN and SFM． Journal of Measurement Science and Instrumentation， 2023， 14（2）： 127-136. DOI： 10．3969／j．issn．1674-8042．2023．02．001

[full text view]

此页面上的内容需要较新版本的 Adobe Flash Player。

Monocular visual 3D cuboid measurement method based on Mask-RCNN and SFM