ISSN 1004-4140
CN 11-3017/P
TANG S J, YUAN T Q, LI S Y, et al. Few-shot Periodic Video Image Segmentation Based on LSTM and Cross-attention Mechanism[J]. CT Theory and Applications, xxxx, x(x): 1-12. DOI: 10.15953/j.ctta.2024.033. (in Chinese).
Citation: TANG S J, YUAN T Q, LI S Y, et al. Few-shot Periodic Video Image Segmentation Based on LSTM and Cross-attention Mechanism[J]. CT Theory and Applications, xxxx, x(x): 1-12. DOI: 10.15953/j.ctta.2024.033. (in Chinese).

Few-shot Periodic Video Image Segmentation Based on LSTM and Cross-attention Mechanism

More Information
  • Received Date: February 26, 2024
  • Revised Date: March 23, 2024
  • Accepted Date: April 07, 2024
  • Available Online: May 13, 2024
  • With the development of modern video technology, periodic motion video image segmentation has important applications in motion analysis, medical imaging, and other fields. In this study, we designed a novel periodic motion detection and segmentation network based on deep learning technology, which combines the convolutional long short term memory network (LSTM) and cross-attention mechanism. With relatively few labels, we can effectively capture the spatiotemporal context information of the objects of interest in the video sequence, achieving cross-frame consistency and accurate segmentation. Experimental results show that the proposed method performs well on periodic motion video datasets with few sample labels. In an ordinary video, the average region similarity and contour accuracy were 67.51% and 72.97%. respectively, which improved by 1%~1.5% than those obtained with the traditional method. In medical videos, the average region similarity and contour accuracy were 59.93% and 90.56%, respectively. Compared with DAN and Unet, the proposed method increased the regional similarity by 12.92% and 8.85%, whereas it improved the contour accuracy by 20.09% and 12.89%, respectively, thus achieving higher accuracy and stability.

  • [1]
    BROX T, MALIK J. Object segmentation by long term analysis of point trajectories[C]//European conference on computer vision. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010: 282-295. DOI: 10.1007/978-3-642-15555-0_21.
    [2]
    LEE Y J, KIM J, GRAUMAN K. Key-segments for video object segmentation[C]//2011 International Conference on Computer Vision. IEEE, 2011: 1995-2002. DOI: 10.1109/iccv.2011.6126471.
    [3]
    WANG W, SHEN J, PORIKLI F. Saliency-aware geodesic video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3395-3402. DOI: 10.1109/cvpr.2015.7298961.
    [4]
    DUTT JAIN S, XIONG B, GRAUMAN K. Fusionseg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 3664-3673. DOI: 10.1109/cvpr.2017.228.
    [5]
    LI S, SEYBOLD B, VOROBYOV A, et al. Instance embedding transfer to unsupervised video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6526-6535. DOI: 10.1109/cvpr.2018.00683.
    [6]
    LU X, WANG W, MA C, et al. See more, know more: Unsupervised video object segmentation with co-attention siamese networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3623-3632. DOI: 10.1109/cvpr.2019.00374.
    [7]
    许欣. 无监督学习的视频多目标分割算法研究[D]. 徐州: 中国矿业大学, 2021. DOI: 10.27623/d.cnki.gzkyu.2021.001191.
    [8]
    成华阳. 基于高效深度学习的实时无监督视频目标分割算法研究[D]. 成都: 电子科技大学, 2022. DOI: 10.27005/d.cnki.gdzku.2022.002787.
    [9]
    CAELLES S, MANINIS K K, PONT-TUSET J, et al. One-shot video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 221-230. DOI: 10.1109/mmsp.2019.8901723.
    [10]
    TOKMAKOV P, ALAHARI K, SCHMID C. Learning video object segmentation with visual memory[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 4481-4490. DOI: 10.1109/iccv.2017.480.
    [11]
    CI H, WANG C, WANG Y. Video object segmentation by learning location-sensitive embeddings[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 501-516. DOI: 10.1007/978-3-030-01252-6_31.
    [12]
    OH S W, LEE J Y, XU N, et al. Video object segmentation using space-time memory networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 9226-9235. DOI: 10.1109/iccv.2019.00932.
    [13]
    陈亚当, 赵翊冰, 吴恩华. 基于动态嵌入特征的鲁棒半监督视频目标分割[J]. 北京航空航天大学学报, 2024, 1-12. DOI: 10.13700/j.bh.1001-5965.2023.0354.

    CHEN Y D, ZHAO Y B, WU E H, et al. Robust semi-supervised video object segmentation with dynamic embedding[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 1-12. DOI:10.13700/j.bh.1001-5965.2023.0354. (in Chinese).
    [14]
    付利华, 赵宇, 姜涵煦, 等. 基于前景感知视觉注意的半监督视频目标分割[J]. 电子学报, 2022, 50(1): 195-206. DOI: 10.12263/DZXB.20201256.

    FU L H , ZHAO Y , JIANG H X , et al. Semi-Supervised video object segmentation based on foreground perception visual attention[J]. Acta Electonica Sinica, 2022, 50(1): 195-206. DOI:10.12263/DZXB.20201256. (in Chinese).
    [15]
    李兰. 基于深度学习的半监督视频目标分割方法研究[D]. 成都: 电子科技大学, 2023. DOI: 10.27005/d.cnki.gdzku.2023.001778.
    [16]
    OH S W, LEE J Y, XU N, et al. Fast user-guided video object segmentation by interaction-and-propagation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 5247-5256. DOI: 10.1109/cvpr.2019.00539.
    [17]
    HEO Y, JUN KOH Y, KIM C S. Interactive video object segmentation using global and local transfer modules[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 2020, Proceedings, Part XVII 16. Springer International Publishing, 2020: 297-313. DOI: 10.1007/978-3-030-58520-4_18.
    [18]
    KHOREVA A, ROHRBACH A, SCHIELE B. Video object segmentation with referring expressions[C]//Computer Vision-ECCV Workshops. Munich, Germany, 2018, Proceedings Part Ⅳ. 2018: 7-12. DOI: 10.1007/978-3-030-11018-5_2.
    [19]
    SEO S, LEE J Y, HAN B. Urvos: Unified referring video object segmentation network with a large-scale benchmark[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16. Springer International Publishing, 2020: 208-223. DOI: 10.1007/978-3-030-58555-6_13.
    [20]
    SIAM M, DORAISWAMY N, ORESHKIN B N, et al. Weakly supervised few-shot object segmentation using co-attention with visual and semantic embeddings[J]. Arxiv Preprint Arxiv: 2001.09540, 2020. DOI: 10.24963/ijcai.2020/120.
    [21]
    唐子淑, 刘杰, 别术林. 基于CV模型的CT图像分割研究[J]. CT理论与应用研究, 2014, 23(2): 193−202.

    TANG Z S, LIU J, BIE S L. Study of CT image segmentation based on CV model[J]. CT Theory and Applications, 2014, 23(2): 193−202. (in Chinese).
    [22]
    周茂, 曾凯, 杨奎, 等. 肺部CT图像分割方法研究[J]. CT理论与应用研究, 2018, 27(6): 683−691. DOI: 10.15953/j.1004-4140.2018.27.06.01.

    ZHOU M, CENG K, YANG K, et al. Research of lung segmentation based on CT image[J]. CT Theory and Applications, 2018, 27(6): 683−691. DOI: 10.15953/j.1004-4140.2018.27.06.01.
    [23]
    邵叶秦, 杨新. 基于随机森林的CT前列腺分割[J]. CT理论与应用研究, 2015, 24(5): 647−655. DOI:10.15953/ j.1004-4140.2015.24.05.02.

    SHAO Y Q, YANG X. CT prostate segmentation based on random forest[J]. CT Theory and Applications, 2015, 24(5): 647−655. DOI: 10.15953/j.1004-4140.2015.24.05.02. (in Chinese).
    [24]
    杨昌俊, 杨新. 基于图割与快速水平集的腹部CT图像分割[J]. CT理论与应用研究, 2011, 20(3): 291−300.

    YANG C J, YANG X. Abdominal CT image segmentation based on graph cuts and fast level set[J]. CT Theory and Applications, 2011, 20(3): 291−300.
    [25]
    BELLO I, ZOPH B, VASWANI A, et al. Attention augmented convolutional networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 3286-3295. DOI:10.1109/iccv.2019.00338. (in Chinese).
    [26]
    KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25. DOI: 10.1145/3065386
    [27]
    HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735−1780. DOI: 10.1162/neco.1997.9.8.1735.
    [28]
    VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30. DOI: 10.48550/arXiv.1706.03762.
    [29]
    PARMAR N, VASWANI A, USZKOREIT J, et al. Image transformer[C]//International Conference on Machine Learning. PMLR, 2018: 4055-4064. DOI: 10.48550/arXiv.1802.05751.
    [30]
    HOU R, CHANG H, MA B, et al. Cross attention network for few-shot classification[J]. Advances in Neural Information Processing Systems, 2019, 32. DOI: 10.48550/arXiv.1910.07677.
    [31]
    DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. Arxiv Preprint Arxiv: 1810.04805, 2018. DOI: 10.18653/v1/N19-1423.
    [32]
    BELLO I, ZOPH B, VASWANI A, et al. Attention augmented convolutional networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 3286-3295. DOI: 10.1109/ICCV.2019.00338.
    [33]
    ZHANG C, LIN G, LIU F, et al. Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 5217-5226. DOI: 10.1016/j.patcog.2021.108468.
    [34]
    DENG J, DONG W, SOCHER R, et al. Imagenet: A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009: 248-255. DOI: 10.1109/cvpr.2009.5206848.
    [35]
    HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778. DOI: 10.1109/cvpr.2016.90
    [36]
    CHEN H, WU H, ZHAO N, et al. Delving deep into many-to-many attention for few-shot video object segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 14040-14049. DOI: 10.1109/cvpr46437.2021.01382.
    [37]
    ZHAO C, SHI S, HE Z, et al. Spatial-temporal V-Net for automatic segmentation and quantification of right ventricle on gated myocardial perfusion SPECT images[J]. Medical Physics, 2023, 50(12): 7415−7426. DOI: 10.1002/mp.16805.
    [38]
    RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer International Publishing, 2015: 234-241. DOI: 10.1007/978-3-319-24574-4_28.
  • Related Articles

    [1]HE Yu, WANG Chengxiang, YU Wei. Industrial CT Image Denoising Network Based on Channel Attention Mechanism[J]. CT Theory and Applications. DOI: 10.15953/j.ctta.2025.068
    [2]LIU Zhichao, XU Zhiwen, ZHAO Sai, ZHANG Youxin, NIE Cong, LEI Ziqiao. Deep Learning Computer-aided Diagnostic Model for Non-small Cell Lung Cancer Based on Convolutional Neural Network and Attention Mechanism[J]. CT Theory and Applications. DOI: 10.15953/j.ctta.2025.020
    [3]Wu Songwen, Fang Chenyun, Qiao Zhiwei. Enhanced Restormer for Low-Dose CT Image Reconstruction Based on Multi-Attention Fusion[J]. CT Theory and Applications. DOI: 10.15953/j.ctta.2025.052
    [4]TANG Binghang, WANG Yanfang, MA Li, CHEN Qingwu, SHAO Liwei, HUANG Dehuang. False Positive Reduction of Pulmonary Nodules Based on Mixed Attentional Mechanism[J]. CT Theory and Applications, 2022, 31(1): 63-72. DOI: 10.15953/j.ctta.2021.002
    [5]DONG Xiaoying, CHEN Ping. Segmentation of Liver Tumors Based on Bottleneck Residual Attention Mechanism U-net[J]. CT Theory and Applications, 2021, 30(6): 661-670. DOI: 10.15953/j.1004-4140.2021.30.06.01
    [6]WEI Dongxu, YAN Lihua, SHI Junqiang. COVID-19 Deep Learning Diagnosis Method Based on Attention Mechanism and Transfer Learning[J]. CT Theory and Applications, 2021, 30(4): 477-486. DOI: 10.15953/j.1004-4140.2021.30.04.08
    [7]RUAN Jian, CHEN Ping, PAN Jin-xiao. A Kind of Segmentation Method for CT Image[J]. CT Theory and Applications, 2010, 19(1): 56-61.
    [8]HU Liang, DONG Fang, LI Bo-lin, LI Ming, CHEN Hao, WANG Yuan, ZHANG Cheng-xin, XU Zhou. Fast Image Segmentation Based on Chaotic Particle Swarm and Two-Dimension Otsu Method[J]. CT Theory and Applications, 2009, 18(1): 29-34.
    [9]WEI Hong-li, YU Xin-bo, ZHAO Wen-cang, GAO Yan-chen. The Medical Image Threshold Value Segmentation Algorithm Study Based on Multi-Wavelet Analysis[J]. CT Theory and Applications, 2009, 18(1): 8-15.
    [10]LI Zheng-rong, LIU Xiao-ping, LI Zi-yuan, DING Hou-ben. Research of the Method for Multi-threshold Value in Image Segmentation[J]. CT Theory and Applications, 2006, 15(4): 13-17.

Catalog

    Article views (163) PDF downloads (26) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return