Multidimensional attention and multiscale upsampling for semantic segmentation

LU Zhongda1,2， ZHANG Chunda1,2， WANG Lijing1,2， XU Fengxia1,2

（1. School of Mechanical and Electrical Engineering， Qiqihar University， Qiqihar 161000， China； 2. Heilongjiang Province Collaborative Innovation Center for Intelligent Manufacturing Equipment Industrialization， Qiqihar 161000， China）

Abstract： Semantic segmentation is for pixellevel classification tasks, and contextual information has an important impact on the performance of segmentation. In order to capture richer contextual information, we adopt ResNet as the backbone network and designs an encoderdecoder architecture based on multidimensional attention (MDA) module and multiscale upsampling (MSU) module. The MDA module calculates the attention matrices of the three dimensions to capture the dependency of each position, and adaptively captures the image features. The MSU module adopts parallel branches to capture the multiscale features of the images, and multiscale feature aggregation can enhance contextual information. A series of experiments demonstrate the validity of the model on Cityscapes and Camvid datasets.

Key words： semantic segmentation； attention mechanism； multiscale feature； convolutional neural network (CNN)； residual network (ResNet)

References

［1］JING L, CHEN Y, TIAN Y. Coarsetofine semantic segmentation from imagelevel labels. IEEE Transactions on Image Processing, 2020, 29： 225236.

［2］TROKIELEWICZ M, CZAJKA A, MACIEJEWICZ P. Postmortem iris recognition with deeplearningbased image segmentation. Image and Vision Computing， 2020, 94： 103866.

［3］LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis Machine Intelligence, 2015, 39(4): 640651.

［4］ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network//IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jul. 2126, 2017, Honolulu, HI, USA. New York: IEEE, 2017： 62306239.

［5］ZHANG H, DANA K, SHI J, et al. Context encoding for semantic segmentation//IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 1822, 2018, Salt Lake City, UT, USA, 2018： 71517160.

［6］CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation. (20171205)［20211028］. https://arxiv.org/abs/1706.05587.

［7］YANG M, YU K, ZHANG C, et al. DenseASPP for semantic segmentation in street scenes. In： Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 1822, 2018, Salt Lake City, UT, USA. New York: IEEE, 2018： 36843692.

［8］PENG C, ZHANG X, YU G, et al. Large kernel mattersimprove semantic segmentation by global convolutional network//IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jul. 2126, 2017, Honolulu, HI, USA. New York: IEEE, 2017： 17431751.

［9］LIN G, MILAN A, SHEN C, et al. RefineNet： multipath refinement networks for highresolution semantic segmentation//IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2126, 2017, Honolulu, HI, USA. New York: IEEE, 2017： 51685177.

［10］RONNEBERGER O, FISCHER P, BROX T. UNet： convolutional networks for biomedical image segmentation//The 18th International Conference on Medical Image Computing and ComputerAssisted Intervention, Oct. 59, 2015, Munich, Germany. Berlin: Springer, 2015： 234241.

［11］KENDALL A, BADRINARAYANAN V, CIPOLLA R. Bayesian SegNet： model uncertainty in deep convolutional encoderdecoder architectures for scene understanding//British Machine Vision Conference, Sept. 47, 2017, London, UK. London: BMVA Press, 2017： 111.

［12］HU J, SHEN L, SUN G. Squeezeandexcitation networks//IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 1822, 2018, Salt Lake City, UT, USA. New York: IEEE, 2018： 71327141.

［13］WANG X, GIRSHICK R, GUPTA A, et al. Nonlocal neural networks//IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 1822, 2018, Salt Lake City, UT, USA. New York: IEEE, 2018： 77947803.

［14］WOO S, PARK J, LEE J Y, et al. CBAM： convolutional block attention module//European Conference on Computer Vision, Sept. 814, 2018, Munich, Germany. Berlin: Spriner, 2018： 319.

［15］HUANG Z, WANG X, HUANG L, et al. CCNet： crisscross attention for semantic segmentation//IEEE/CVF International Conference on Computer Vision, Oct. 2728, 2019, Seoul, Korea. New York: IEEE, 2019： 603612.

［16］ZHAO H, ZHANG Y, LIU S, et al. PSANet： pointwise spatial attention network for scene parsing//European Conference on Computer Vision, Sept. 814, 2018, Munich, Germany. Berlin: Springer, 2018： 270286.

［17］ZHONG Z, LIN Z, BIDART R, et al. Squeezeandattention networks for semantic segmentation//IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 1319, 2020, Seattle, WA, USA. New York: IEEE, 2020： 1306213071.

［18］FU J, Liu J, Tian H, et al.Dual attention network for scene segmentation//IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 1620, 2019, Long Beach, CA, USA. New York: IEEE, 2019： 31463154.

［19］YUAN Y, HUANG L, GUO J, et al. OCNet： object context network for scene parsing. (20210315)［20111109］. https://arxiv.org/ abs/1809.00916.

［20］YU F, KOLTUN V. Multiscale context aggregation by dilated convolions. (20160430)［20211109］. https://arxiv.org/abs/1511.07122.

［21］CHEN W, GONG X, LIU X, et al. Fasterseg： searching for faster realtime semantic segmentation. (20200116)［20211109］. https://arxiv.org/abs/1912.10917v2.

［22］LIU M, YIN H. Efficient pyramid context encoding and feature embedding for semantic segmentation. Image and Vision Computing, 2021, 111(4)： 104195.

［23］LI X, LI X, ZHANG L, et al. Improving semantic segmentation via decoupled body and edge supervision//European Conference on Computer Vision, Aug. 2328, 2020, Glasgow, US. Berlin: Springer, 2020： 435452.

［24］SUN Q, ZHANG Z, LI P. Secondorder encoding networks for semantic segmentation. Neurocomputing, 2021, 445： 5060.

［25］DING H, JIANG X, SHUAI B, et al.Semantic correlation promoted shapevariant context for segmentation//IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 1620, 2019, Long Beach, CA, USA. New York: IEEE, 2019： 88778886.

［26］SANG H, ZHOU Q, ZHAO Y. PCANet： pyramid convolutional attention network for semantic segmentation. Image and Vision Computing, 2020, 103： 103997.

［27］LI X, YANG Y, ZHAO Q, et al. Spatial pyramid based graph reasoning for semantic segmentation//IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 1319, 2020, Seattle, WA, USA. New York: IEEE, 2020： 89508959.

［28］CHEN L, PAPANDREOU G, KOKKINOS L, et al. Deeplab： semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4)： 834848.

［29］WU G, LI Y. CyclicNet： an alternately updated network for semantic segmentation. Multimedia Tools and Applications, 2021, 80(2)： 115.

［30］ZHU H, ZHANG M, ZHANG X, et al. Twobranch encoding and iterative attention decoding network for semantic segmentation. Neural Computing and Applications, 2021, 33(4)： 51515166.

［31］QI M, WANG Y, LI A, et al. STCGAN： spatiotemporally coupled generative adversarial networks for predictive scene parsing. IEEE Transactions on Image Processing, 2020, 29： 54205430.

［32］YU C, WANG J, PENG C, et al. BiSeNet： bilateral segmentation network for realtime semantic segmentation//European Conference on Computer Vision, Sept. 814, 2018, Munich, Germany. Berlin: Springer, 2018： 325341.

［33］LIU M, YIN H. Efficient pyramid context encoding and feature embedding for semantic segmentation. Image and Vision Computing, 2021, 111： 104195.

［34］YANG T, WU Y, ZHAO J, et al. Semantic segmentation via highly fused convolutional network with multiple soft cost functions. Cognitive Systems Research, 2018, 53： 2030.

基于多维度注意力和多尺度上采样的语义分割研究

陆仲达1,2, 张春达1,2 , 王丽婧1,2, 徐凤霞1,2

（1. 齐齐哈尔大学机电工程学院，黑龙江齐齐哈尔 161000； 2. 黑龙江省智能制造装备产业化协同创新中心，黑龙江齐齐哈尔 161000）

摘要:语义分割作完成像素级的分类任务，上下文信息对分割的性能有重要的影响。为了获取更丰富的上下文信息，采用ResNet作为主干网络，设计了一个基于多维度注意模块(Multidimensional attention, MDA)和多尺度上采样模块(Multiscale upsampling, MSU)的编码器解码器结构。多维度注意力模块计算三个维度的注意力矩阵，以获取每个位置的依赖性，同时注意力机制能自适应地捕捉图像特征。多尺度上采样模块采用并行分支来捕获图像的多尺度特征，多尺度特征聚合有效地增强了图像的上下文信息。在Cityscapes和Camvid数据集上进行的一系列实验表明，该网络能有效提升图像分割精度。

关键词:语义分割；注意力机制；多尺度特征；卷积神经网络；残差网络

引用格式：LU Zhongda, ZHANG Chunda, WANG Lijing, et al． Multidimensional attention and multiscale upsampling for semantic segmentation． Journal of Measurement Science and Instrumentation， 2022， 13（1）： 6878. DOI： 10．3969／j．issn．16748042．2022．01．008

[full text view]

此页面上的内容需要较新版本的 Adobe Flash Player。

Multidimensional attention and multiscale upsampling for semantic segmentation