BAI Yanqiong, ZHENG Yufu, TIAN Hong
(School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China)
Abstract: In the study of automatic driving, understanding the road scene is a key to improve driving safety. The semantic segmentation method could divide the image into different areas associated with semantic categories in accordance with the pixel level, so as to help vehicles to perceive and obtain the surrounding road environment information, which would improve driving safety. Deeplabv3+ is the current popular semantic segmentation model. There are phenomena that small targets are missed and similar objects are easily misjudged during its semantic segmentation tasks, which leads to rough segmentation boundary and reduces semantic accuracy. This study focuses on the issue, based on the Deeplabv3+ network structure and combined with the attention mechanism, to increase the weight of the segmentation area, and then proposes an improved Deeplabv3+ fusion attention mechanism for road scene semantic segmentation method. First, a group of parallel position attention module and channel attention module are introduced on the Deeplabv3+ encoding end to capture more spatial context information and high-level semantic information. Then, an attention mechanism is introduced to restore the spatial detail information, and the data shall be normalized in order to accelerate the convergence speed of the model at the decoding end. The effects of model segmentation with different attention-introducing mechanisms are compared and tested on CamVid and Cityscapes datasets. The experimental results show that the mean Intersection over Unons of the improved model segmentation accuracies on the two datasets are boosted by 6.88% and 2.58%, respectively, which is better than using Deeplabv3+. This method does not significantly increase the amount of network calculation and complexity, and has a good balance of speed and accuracy.
Key words: autonomous driving; road scene; semantic segmentation; Deeplabv3+; attention mechanismReferences
[1]Cai Y, Huang X G, Zhu X N, et al. Real-time semantic segmentation algorithm based on feature fusion technology. Laser & Optoelectronics Progress, 2020, 7(2): 021011.
[2]Wei Y C, Zhao Y. A review on image semantic segmentation based on DCNN. Journal of Beijing Jiaotong University, 2016, 40(4): 82-91.
[3]Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3431-3440.
[4]Tian X, Wang L, Ding Q. Review of image semantic segmentation based on deep learning. Journal of Software, 2019, 30(2): 440-468.
[5]Paszke A, Chaurasia A, Kim S, et al. ENet: a deep neural network architecture for real-time semantic segmentation. (2016-06-07)[2019-11-25]. https://arxiv.org/abs/1606.02147.pdf.
[6]Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[7]Ronneberger O, Fischer P and Brox T. U-Net: convolutional networks for biomedical image segmentation. (2015-05-18)[2019-11-25]. https:// arxiv.org/abs/1505.04597.
[8]Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848.
[9]Chen L C, Papandreou George, Schroff Florian, et al. Rethinking atrous convolution for semantic image segmentation. (2017-06-17)[2019-11-25]. https://arxiv.org/abs/1706.05587. pdf.
[10]Chen L C, Yang Y, Wang J, et al. Attention to scale: Scale-aware semantic image segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 3640-3649.
[11]Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation. (2019-04-21)[2019-11-25]. https://arxiv.org/abs/1809. 02983.pdf.
[12]Chen L C, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs. (2014-12-22)[2019-11-25]. https://arxiv.org/abs/1412.7062.pdf.
[13]Heng H D, Xu D J, Bing S, et al. Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 2393-2402.
[14]Liu W X, Shu Y Z, Tang X M, et al. Remote sensing image segmentation using dual attention mechanism Deeplabv3+ algorithm. Tropical Geography, 2020, 40(2): 303-313.
[15]Guo Y M, Liu Y, Georgiou T, et al. A review of semantic segmentation using deep neural networks. International Journal of Multimedia Information Retrieval, 2018, 7(2): 87-93.
[16]Brostow G J, Shotton J, Fauqueur J, et al. Segmentation and recognition using structure from motion point clouds. European Conference on Computer Vision. Springer, Berlin, Heidelberg, 2008: 44-57.
[17]Cordts M, Omran M, Ramos S, et al. The Cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 3213-3223.
[18]Wang Z Y, Ni X Y, Shang Z D. Autonomous driving semantic segmentation with convolution neural networks. Optics and Precision Engineering, 2019, 27(11): 2430-2437.
[19]Li S B, Guan Y, Hou L, et al. Key technique of deep neural network and its applications in autonomous driving. Journal of Automotive Safety and Energy, 2019, 10(2): 119-145.
[20]Han H H, Li W T, Wang J P, et al. Semantic segmentation of encoder-decoder structure. Journal of Image and Graphics, 2020, 25(2): 0255-0266.
基于Deeplabv3+和注意力机制的道路场景语义分割方法
白艳琼, 郑玉甫, 田宏
(兰州交通大学 电子与信息工程学院, 甘肃 兰州 730070)
摘要:在自动驾驶技术研究中, 理解道路场景是提高驾驶安全性的保障。 语义分割技术可以在像素级别上, 将图片分割成与语义类别相关联的不同图像区域, 可以辅助车辆感知、 理解周围的道路环境信息, 从而提高驾驶安全性。 当下流行的语义分割模型 Deeplabv3+ 在分割任务中, 存在细小目标被漏分割以及外形相似物体容易被误判等现象, 导致分割边界粗糙, 精准度降低。 针对此问题, 在 Deeplabv3+ 网络结构的基础上, 结合注意力机制加重分割区域的权重, 提出一种改进的 Deeplabv3+ 融合注意力机制的道路场景语义分割方法。 首先, 在 Deeplabv3+ 编码端引入一组并联的位置注意力模块和空间注意力模块, 捕获更多空间上下文信息和高级语义信息。 然后, 在解码端引入注意力机制恢复空间细节信息, 并对数据归一化处理, 加快模型收敛速度。 将不同方式引入注意力机制的模型分割效果进行对比, 在CamVid数据集和Cityscapes数据集上进行了测试。 实验结果表明, 相比 Deeplabv3+, 改进后的模型分割准确度平均交并比在两个数据集上分别提升了6.88%和2.58%, 效果优于 Deeplabv3+。 该方法不会明显加大网络计算量和复杂度, 具有良好的分割速度和准确性的兼顾。
关键词:自动驾驶; 道路场景; 语义分割; Deeplabv3+; 注意力机制
引用格式: BAI Yanqiong, ZHENG Yufu, TIAN Hong. Semantic segmentation method of road scene based on Deeplabv3+ and attention mechanism. Journal of Measurement Science and Instrumentation, 2021, 12(4): 412-422. DOI: 10.3969/j.issn.1674-8042.2021.04.005
[full text view]