摘要
基于卷积神经网络中的各个层次特征,提出了一种基于多尺度融合增强的服装图像解析方法。通过融合增强模块,在考虑全局信息的基础上对包含的语义信息和不同尺度特征进行有效融合。结果表明:在Fashion Clothing测试集上的平均F1分数达到60.57%,在LIP(Look Into Person)验证集上的平均交并比(mean intersection over union,MIoU)达到54.93%。该方法可以有效地提升服装图像解析精度。
关键词
随着服装和互联网行业的快速发展,服装图像解析作为图像处理的一个重要应用有着巨大的发展前景。服装图像解析的目标是对服装图像各个部分的组成进行像素级别的识别,将服装图像按照若干个类别划分为若干个区域。服装图像解析是计算机视觉中一项特定形式的细粒度分割。因此,服装图像解析研究对服装检
服装图像解析在人工智能等领域具有广阔的应用前景。Chen
深度学习中的注意力机制源于人类视觉特性,当人类观察事物时,选择性地获取所观察事物的重要特征,忽略不重要特征。深度学习中的注意力机制借鉴了人类的视觉机制,旨在自适应地聚集有相关性的特征,帮助深度学习模型对输入的信息赋予不同的权重,获取更有用的特征,所以注意力机制被广泛应用于语义分割、目标识别和图像分类等计算机视觉领域。Hu
提出的多尺度融合增强网络结构如

图1 多尺度融合增强网络结构
Fig.1 Structure of multi-scale fusion enhanced network

图2 融合增强模块
Fig.2 Fusion enhancement module
如
所有实验均是在2个NVIDIA GTX1070 GPU服务器上利用Ubuntu18.04、Python3.6和Pytorch0.4.1搭建的深度学习框架。使用步长为16的预训练好的ResNet10
实验中使用的数据集是公共数据集Fashion Clothing数据集和LIP(Look Into Person)数据集。Fashion Clothing数据集由Clothing Co-Parsin
在Fashion Clothing数据集中使用像素准确率、前景准确率、平均精确率、平均召回率和平均F1分数5个评价指标对网络性能进行评估。在LIP数据集中使用像素准确率、平均准确率、每个类别的交并比和平均交并比4个评价指标对网络性能进行评估。

图3 不同方法在Fashion Clothing数据集上的解析结果对比
Fig.3 Comparison of parsing results between different methods on Fashion Clothing dataset
为了进一步验证本方法的有效性和泛化性,
提出了一种基于多尺度融合增强的服装图像解析方法。通过融合增强模块设计,在提取不同尺度特征的基础上,利用通道注意力机制优先考虑全局特征,增强多尺度特征信息,达到获取更多细节特征的目的。实验结果表明,本方法不仅可以提升较大目标的解析效果,还对帽子、腰带和眼镜等小物体的解析效果有明显改善。虽然本方法对较小对象的解析结果有所改善,但是与其他类别相比,小目标对象的解析精度仍然较低。在未来的研究中,将考虑利用目标检测技术定位小目标对象,从而提升小目标对象的解析精度。
作者贡献声明
陈丽芳:模型网络结构构思、设计、分析,论文修改与校对。
余恩婷:模型网络结构程序与实验设计,论文撰写与修改。
参考文献
LIU S, SONG Z, LIU G, et al. Street-to-shop: cross-scenario clothing retrieval via parts alignment and auxiliary set[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence:IEEE, 2012:3330-3337. [百度学术]
徐慧, 白美丽, 万韬阮, 等. 基于深度学习的服装图像语义分析与检索推荐[J]. 纺织高校基础科学学报, 2020,33(3):64. [百度学术]
XU Hui, BAI Meili, WAN Taoruan, et al. Semantic analysis and retrieval recommendation of clothing images based on deep learning[J]. Journal of Basic Science of Textile Colleges, 2020,33(3):64. [百度学术]
ZHU S, URTASUN R, FIDLER S, et al. Be your own Prada: fashion synthesis with structural coherence[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice :IEEE,2017:1680-1688. [百度学术]
LIU X, ZHANG M, LIU W, et al. BraidNet: braiding semantics and details for accurate human parsing[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2019: 338-346. [百度学术]
LUO Y, ZHENG Z, ZHENG L, et al. Macro-micro adversarial network for human parsing[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2018: 418-434. [百度学术]
WANG W, ZHANG Z, QI S, et al. Learning compositional neural information fusion for human parsing[C]//Proceedings of the IEEE International Conference on Computer Vision. Seoul:IEEE, 2019: 5703-5713. [百度学术]
GONG K, GAO Y, LIANG X, et al. Graphonomy: universal human parsing via graph transfer learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7450-7459. [百度学术]
CHEN L C, YANG Y, WANG J, et al. Attention to scale: scale-aware semantic image segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE,2016: 3640-3649. [百度学术]
ZHAO Y, LI J, ZHANG Y, et al. Multi-class part parsing with joint boundary-semantic awareness[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Long Beach: IEEE, 2019: 9177-9186. [百度学术]
LUO X, SU Z, GUO J, et al. Trusted guidance pyramid network for human parsing[C]//Proceedings of the 26th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2018: 654-662. [百度学术]
HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE,2018: 7132-7141. [百度学术]
WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer,2018:3-19. [百度学术]
WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7794-7803. [百度学术]
HU J, SHEN L, ALBANIE S, et al. Gather-excite: exploiting feature context in convolutional neural networks[C]//NIPS’18: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc., 2018: 9423-9433. [百度学术]
LI H, XIONG P, An J, et al. Pyramid attention network for semantic segmentation[J/OL]. [2021-05-06].https://arxiv.org/abs/1805.10180. [百度学术]
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas:IEEE, 2016: 770-778. [百度学术]
YANG W, LUO P, LIN L. Clothing co-parsing by joint image segmentation and labeling[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 3182-3189. [百度学术]
YAMAGUCHI K, KIAPOUR M H, ORTIZ L E, et al. Parsing clothing in fashion photographs[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3570-3577. [百度学术]
LIU S, FENG J, DOMOKOS C, et al. Fashion parsing with weak color-category labels[J]. IEEE Transactions on Multimedia, 2013, 16(1): 253. [百度学术]
GONG K, LIANG X, ZHANG D, et al. Look Into Person: self-supervised structure-sensitive learning and a new benchmark for human parsing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE,2017: 932-940. [百度学术]
WANG W, ZHU H, DAI J, et al. Hierarchical human parsing with typed part-relation reasoning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 8929-8939. [百度学术]
YAMAGUCHI K, HADI K M, BERG T L. Paper doll parsing: retrieving similar styles to parse clothing items[C]//Proceedings of the IEEE International Conference on Computer Vision. Sydney: IEEE,2013: 3519-3526. [百度学术]
CHEN L C,PAPANDREOU G,KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834. [百度学术]
LUC P, COUPRIE C, CHINTALA S, et al. Semantic segmentation using adversarial networks[C]// Workshop on Adversarial Training, NIPS 2016. Barcelona: IEEE, 2016: 1-9. [百度学术]
LIANG X, GONG K, SHEN X, et al. Look Into Person: joint body parsing & pose estimation network and a new benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(4): 871. [百度学术]
RUAN T, LIU T, HUANG Z, et al. Devil in the details: towards accurate single and multiple human parsing[C]//The Thirty-Third AAAI Conference on Artificial Intelligence. Menlo Park: Association for the Advancement of Artificial Intelligence, 2019:4814-4821. [百度学术]
ZHANG S, QI G J, CAO X, et al. Human parsing with pyramidical gather-excite context[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(3): 1016. [百度学术]