针对现有的小样本语义分割模型对未知新类分割精度不高的问题,提出一种基于元学习的小样本语义分割算法.首先,利用深度可分离卷积改进传统主干网络,并在ImageNet数据集上进行了编码器的预训练.其次,利用预训练的主干网络将支持图片和查询图片映射到深度特征空间.最后,利用支持图片的真实掩码将支持特征分离为目标前景和背景,并借助vision transformer构造了一种自适应的元学习分类器.在PASCAL-5i数据集上进行了大量的试验.结果表明:所提出模型在VGG-16、ResNet-50和ResNet-101主干网络上分别实现了47.1%、58.3%和60.4%的mIoU(即平均交并比)(1 shot),同时在5 shot设定下实现了49.6%、60.2%和62.1%的mIoU;在COCO-20i数据集上实现了23.6%、30.3%和30.7%的mIoU(1 shot),同时在5 shot设定下实现了30.1%、34.7%和35.2%的mIoU.
Abstract
To solve the problem of low segmentation accuracy for unknown novel classes in existing few shot semantic segmentation models, the few shot semantic segmentation algorithm based on meta-learning was proposed. The depth-separable convolutions were utilized to improve the traditional backbone network, and the encoder pre-training on the ImageNet dataset was performed. The pre-trained backbone network was used to map the support and query images into deep feature space. Using the ground truth masks of the support images, the support features were separated into object foreground and background, and the adaptive meta-learning classifier was constructed using vision transformer. The extensive experiments on the PASCAL-5i dataset were completed. The results show that the proposed model achieves mIoU (mean Intersection over Union) (1 shot) of 47.1%, 58.3% and 60.4% on VGG-16, ResNet-50 and ResNet-101 backbone networks, respectively, and it achieves mIoU of 49.6%, 60.2% and 62.1% under the 5 shot setting. On the COCO-20i dataset, mIoU (1 shot) values of 23.6%, 30.3% and 30.7% are achieved with mIoU values of 30.1%, 34.7% and 35.2% under the 5 shot setting.
关键词
/
小样本语义分割 /
特征分离 /
元学习 /
深度可分离卷积 /
vision transformer /
目标前景 /
自适应
{{custom_keyword}} /
Key words
few shot semantic segmentation /
feature separation /
meta-learning /
depth-separable convo-lution /
vision transformer /
object foreground /
self-adaption
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1]姚庆安,张鑫,刘力鸣,等. 融合注意力机制和多尺度特征的图像语义分割[J]. 吉林大学学报(理学版), 2022,60(6):1383-1390.
YAO Q A, ZHANG X, LIU L M, et al. Image semantic segmentation based on fusing attention mechanism and multi-scale features[J]. Journal of Jilin University (Science Edition),2022,60(6):1383-1390.(in Chinese)
[2]SHABAN A, BANSAL S, LIU Z, et al. One-shot lear-ning for semantic segmentation[C]∥Proceedings of the 28th British Machine Vision Conference.[S.l.]: BMVA Press, DOI:10.5244/c.31.167.
[3]WANG K X, LIEW J H, ZOU Y T, et al. PANet: few-shot image semantic segmentation with prototype alignment[C]∥Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE,2019:9196-9205.
[4]LI G, JAMPANI V, SEVILLA-LARA L, et al. Adaptive prototype learning and allocation for few-shot segmentation[C]∥Proceedings of the 2021 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Piscataway: IEEE Computer Society, 2021:8330-8339.
[5]LIU B H, JIAO J B, YE Q X. Harmonic feature activation for few-shot semantic segmentation[J]. IEEE Tran-sactions on Image Processing, 2021,30:3142-3153.
[6]JOHNANDER J, EDSTEDT J, DANELLJAN M, et al. Deep Gaussian processes for few-shot segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021:5217-5226.
[7]PAMBALA A K, DUTTA T, BISWAS S. SML: semantic meta-learning for few-shot semantic segmentation[J]. Pattern Recognition Letters, 2021,147:93-99.
[8]LIU W D, ZHANG C, LIN G S, et al. CRNet: cross-reference networks for few-shot segmentation[C]∥Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Computer Society, 2020:4164-4172.
[9]RAKELLY K,SHELHAMER E,DARRELL T,et al. Conditional networks for few-shot semantic segmentation[C]∥Proceedings of the 6th International Conference on Learning Representations.[S.l.]: International Conference on Learning Representations, 2018:517-526.
[10]ZHANG X L, WEI Y C, YANG Y, et al. SG-one: simi-larity guidance network for one-shot semantic segmentation[J]. IEEE Transactions on Cybernetics, 2020,50(9):3855-3865.
[11]LIU Y F, ZHANG X Y, ZHANG S Y, et al. Part-aware prototype network for few-shot semantic segmentation[C]∥Proceedings of the 16th European Conference on Computer Vision.[S.l.]: Springer Science and Business Media Deutschland GmbH,2020:142-158.
[12]YANG B Y, LIU C, LI B H, et al. Prototype mixture models for few-shot semantic segmentation[C]∥Proceedings of the 16th European Conference on Computer Vision.[S.l.]: Springer Science and Business Media Deutschland GmbH, 2020:763-778.
[13]ZHANG X L, WEI Y C, LI Z,et al. Rich embedding features for one-shot semantic segmentation[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022,33(11):6484-6493.
[14]刘宇轩,孟凡满,李宏亮,等. 一种结合全局和局部相似性的小样本分割方法[J]. 北京航空航天大学学报,2021,47(3):665-674.
LIU Y X, MENG F M, LI H L, et al. A few shot seg-mentation method combining global and local similarity[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021,47(3):665-674. (in Chinese)
[15]NGUYEN K,TODOROVIC S. Feature weighting and boosting for few-shot segmentation[C]∥Proceedings of the 17th IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019:622-631.
[16]WANG H C, ZHANG X D, HU Y T, et al. Few-shot semantic segmentation with democratic attention networks[C]∥Proceedings of the 16th European Confe-rence on Computer Vision. Berlin: Springer Science and Business Media Deutschland GmbH, 2020:730-746.
[17]李文举,李文辉.基于压缩表示的实例分割方法[J].吉林大学学报(理学版),2023,61(4):883-889.
LI W J, LI W H. Instance segmentation method based on compressed representation[J]. Journal of Jilin University(Science Edition), 2023,61(4):883-889.(in Chinese)
[18]邓晓青,李征,王雁林.基于U-Net改进的内窥镜息肉图像分割算法[J].四川大学学报(自然科学版),DOI:10.19907/j.0490-6756.2024.013004.
DENG X Q, LI Z, WANG Y L. An improved endosco-pic polyp image segmentation algorithm based on U-Net[J]. Journal of Sichuan University (Natural Science Edition), DOI:10.19907/j.0490-6756.2024.013004. (in Chinese)
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
山东省重点研发计划项目(2021RKL02001)
{{custom_fund}}