Few-shot Periodic Video Image Segmentation Based on LSTM and Cross-attention Mechanism

TANG Shaojie; YUAN Tengqi; LI Siyu; LI Shubo; ZHANG Ting; WEI Qiuyue; YAO Hongping

doi:10.15953/j.ctta.2024.033

Volume 34 Issue 4

Jul. 2025

Turn off MathJax

Article Contents

Abstract

References

CT Theory and Applications > 2025 > 34(4): 667-676. > DOI: 10.15953/j.ctta.2024.033

TANG S J, YUAN T Q, LI S Y, et al. Few-shot Periodic Video Image Segmentation Based on LSTM and Cross-attention Mechanism[J]. CT Theory and Applications, 2025, 34(4): 667-676. DOI: 10.15953/j.ctta.2024.033. (in Chinese).

Citation:

PDF (3589 KB)

Few-shot Periodic Video Image Segmentation Based on LSTM and Cross-attention Mechanism

1.
School of Automation, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
2.
Department of Pharmacy, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China

More Information

Received Date: February 26, 2024
Revised Date: March 23, 2024
Accepted Date: April 07, 2024
Available Online: May 13, 2024

Graphical Abstract

Abstract

Abstract

With the development of modern video technology, periodic motion video image segmentation has important applications in motion analysis, medical imaging, and other fields. In this study, we designed a novel periodic motion detection and segmentation network based on deep learning technology, which combines the convolutional long short term memory network (ConvLSTM) and cross-attention mechanism. With relatively few labels, we can effectively capture the spatiotemporal context information of the objects of interest in the video sequence, achieving cross-frame consistency and accurate segmentation. Experimental results show that the proposed method performs well on periodic motion video datasets with few sample labels. In an ordinary video, the average region similarity and contour accuracy were 67.51% and 72.97%. respectively, which improved by 1%~1.5% than those obtained with the traditional method. In medical videos, the average region similarity and contour accuracy were 59.93% and 90.56%, respectively. Compared with DAN and Unet, the proposed method increased the regional similarity by 12.92% and 8.85%, whereas it improved the contour accuracy by 20.09% and 12.89%, respectively, thus achieving higher accuracy and stability.
- deep learning,
- video segmentation,
- image segmentation,
- LSTM,
- cross-attention mechanism

FullText(HTML)

References (38)

References

[1]	BROX T, MALIK J. Object segmentation by long term analysis of point trajectories[C]//European conference on computer vision. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010: 282-295. DOI: 10.1007/978-3-642-15555-0_21.
[2]	LEE Y J, KIM J, GRAUMAN K. Key-segments for video object segmentation[C]//2011 International Conference on Computer Vision. IEEE, 2011: 1995-2002. DOI: 10.1109/iccv.2011.6126471.
[3]	WANG W, SHEN J, PORIKLI F. Saliency-aware geodesic video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3395-3402. DOI: 10.1109/cvpr.2015.7298961.
[4]	DUTT J S, XIONG B, GRAUMAN K. Fusionseg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 3664-3673. DOI: 10.1109/cvpr.2017.228.
[5]	LI S, SEYBOLD B, VOROBYOV A, et al. Instance embedding transfer to unsupervised video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6526-6535. DOI: 10.1109/cvpr.2018.00683.
[6]	LU X, WANG W, MA C, et al. See more, know more: Unsupervised video object segmentation with co-attention siamese networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3623-3632. DOI: 10.1109/cvpr.2019.00374.
[7]	许欣. 无监督学习的视频多目标分割算法研究[D]. 徐州: 中国矿业大学, 2021. DOI: 10.27623/d.cnki.gzkyu.2021.001191.
[8]	成华阳. 基于高效深度学习的实时无监督视频目标分割算法研究[D]. 成都: 电子科技大学, 2022. DOI: 10.27005/d.cnki.gdzku.2022.002787.
[9]	CAELLES S, MANINIS K K, PONT-TUSET J, et al. One-shot video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 221-230. DOI: 10.1109/mmsp.2019.8901723.
[10]	TOKMAKOV P, ALAHARI K, SCHMID C. Learning video object segmentation with visual memory[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 4481-4490. DOI: 10.1109/iccv.2017.480.
[11]	CI H, WANG C, WANG Y. Video object segmentation by learning location-sensitive embeddings[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 501-516. DOI: 10.1007/978-3-030-01252-6_31.
[12]	OH S W, LEE J Y, XU N, et al. Video object segmentation using space-time memory networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019: 9226-9235. DOI: 10.1109/iccv.2019.00932.
[13]	陈亚当, 赵翊冰, 吴恩华. 基于动态嵌入特征的鲁棒半监督视频目标分割[J]. 北京航空航天大学学报. DOI: 10.13700/j.bh.1001-5965.2023.0354. CHEN Y D, ZHAO Y B, WU E H. Robust semi-supervised video object segmentation with dynamic embedding[J]. Journal of Beijing University of Aeronautics and Astronautics. DOI:10.13700/j.bh.1001-5965.2023.0354. (in Chinese).
[14]	付利华, 赵宇, 姜涵煦, 等. 基于前景感知视觉注意的半监督视频目标分割[J]. 电子学报, 2022, 50(1): 195-206. DOI: 10.12263/DZXB.20201256. FU L H , ZHAO Y , JIANG H X , et al. Semi-Supervised video object segmentation based on foreground perception visual attention[J]. Acta Electonica Sinica, 2022, 50(1): 195-206. DOI:10.12263/DZXB.20201256. (in Chinese).
[15]	李兰. 基于深度学习的半监督视频目标分割方法研究[D]. 成都: 电子科技大学, 2023. DOI: 10.27005/d.cnki.gdzku.2023.001778.
[16]	OH S W, LEE J Y, XU N, et al. Fast user-guided video object segmentation by interaction-and-propagation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 5247-5256. DOI: 10.1109/cvpr.2019.00539.
[17]	HEO Y, JUN KOH Y, KIM C S. Interactive video object segmentation using global and local transfer modules[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 2020, Proceedings, Part XVII 16. Springer International Publishing, 2020: 297-313. DOI: 10.1007/978-3-030-58520-4_18.
[18]	KHOREVA A, ROHRBACH A, SCHIELE B. Video object segmentation with referring expressions[C]//Computer Vision-ECCV Workshops. Munich, Germany, 2018, Proceedings Part Ⅳ. 2018: 7-12. DOI: 10.1007/978-3-030-11018-5_2.
[19]	SEO S, LEE J Y, HAN B. Urvos: Unified referring video object segmentation network with a large-scale benchmark[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 2020, Proceedings, Part XV 16. Springer International Publishing, 2020: 208-223. DOI: 10.1007/978-3-030-58555-6_13.
[20]	SIAM M, DORAISWAMY N, ORESHKIN B N, et al. Weakly supervised few-shot object segmentation using co-attention with visual and semantic embeddings[J]. Arxiv Preprint Arxiv: 2001.09540, 2020. DOI: 10.24963/ijcai.2020/120.
[21]	唐子淑, 刘杰, 别术林. 基于CV模型的CT图像分割研究[J]. CT理论与应用研究, 2014, 23(2): 193-202. TANG Z S, LIU J, BIE S L. Study of CT image segmentation based on CV model[J]. CT Theory and Applications, 2014, 23(2): 193-202. (in Chinese).
[22]	周茂, 曾凯, 杨奎, 等. 肺部CT图像分割方法研究[J]. CT理论与应用研究, 2018, 27(6): 683-691. DOI: 10.15953/j.1004-4140.2018.27.06.01. ZHOU M, CENG K, YANG K, et al. Research of lung segmentation based on CT image[J]. CT Theory and Applications, 2018, 27(6): 683-691. DOI: 10.15953/j.1004-4140.2018.27.06.01. (in Chinese).
[23]	邵叶秦, 杨新. 基于随机森林的CT前列腺分割[J]. CT理论与应用研究, 2015, 24(5): 647-655. DOI: 10.15953/j.1004-4140.2015.24.05.02. SHAO Y Q, YANG X. CT prostate segmentation based on random forest[J]. CT Theory and Applications, 2015, 24(5): 647-655. DOI: 10.15953/j.1004-4140.2015.24.05.02. (in Chinese).
[24]	杨昌俊, 杨新. 基于图割与快速水平集的腹部CT图像分割[J]. CT理论与应用研究, 2011, 20(3): 291-300. YANG C J, YANG X. Abdominal CT image segmentation based on graph cuts and fast level set[J]. CT Theory and Applications, 2011, 20(3): 291-300. (in Chinese).
[25]	BELLO I, ZOPH B, VASWANI A, et al. Attention augmented convolutional networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 3286-3295. DOI:10.1109/iccv.2019.00338. (in Chinese).
[26]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25. DOI: 10.1145/3065386
[27]	HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. DOI: 10.1162/neco.1997.9.8.1735.
[28]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30. DOI: 10.48550/arXiv.1706.03762.
[29]	PARMAR N, VASWANI A, USZKOREIT J, et al. Image transformer[C]//International Conference on Machine Learning. PMLR, 2018: 4055-4064. DOI: 10.48550/arXiv.1802.05751.
[30]	HOU R, CHANG H, MA B, et al. Cross attention network for few-shot classification[J]. Advances in Neural Information Processing Systems, 2019, 32. DOI: 10.48550/arXiv.1910.07677.
[31]	DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. Arxiv Preprint Arxiv: 1810.04805, 2018. DOI: 10.18653/v1/N19-1423.
[32]	BELLO I, ZOPH B, VASWANI A, et al. Attention augmented convolutional networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 3286-3295. DOI: 10.1109/ICCV.2019.00338.
[33]	ZHANG C, LIN G, LIU F, et al. Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5217-5226. DOI: 10.1016/j.patcog.2021.108468.
[34]	DENG J, DONG W, SOCHER R, et al. Imagenet: A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009: 248-255. DOI: 10.1109/cvpr.2009.5206848.
[35]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778. DOI: 10.1109/cvpr.2016.90
[36]	CHEN H, WU H, ZHAO N, et al. Delving deep into many-to-many attention for few-shot video object segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 14040-14049. DOI: 10.1109/cvpr46437.2021.01382.
[37]	ZHAO C, SHI S, HE Z, et al. Spatial-temporal V-Net for automatic segmentation and quantification of right ventricle on gated myocardial perfusion SPECT images[J]. Medical Physics, 2023, 50(12): 7415-7426. DOI: 10.1002/mp.16805.
[38]	RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer International Publishing, 2015: 234-241. DOI: 10.1007/978-3-319-24574-4_28.

[1]	HU Hai, SUN Xueqin, LI Yihong, CHEN Ping. Dual-view CT Reconstruction Algorithm Based on Gradient Information Constraints[J]. CT Theory and Applications, 2025, 34(4): 525-533. DOI: 10.15953/j.ctta.2025.116
[2]	Wu Songwen, Fang Chenyun, Qiao Zhiwei. Enhanced Restormer for Low-Dose CT Image Reconstruction Based on Multi-Attention Fusion[J]. CT Theory and Applications. DOI: 10.15953/j.ctta.2025.052
[3]	GUO Yundong. Seismic Data Reconstruction Based on the POCS Method in the Curvelet Domain with Prior Information[J]. CT Theory and Applications, 2024, 33(2): 149-158. DOI: 10.15953/j.ctta.2023.078
[4]	LIU Yanting, ZHONG Chengcheng, JIANG Guoming, ZHAO Dapeng. Subduction Dynamics at the Northwestern Pacific Slab Edge: Constraints of Tomography in Kamchatka[J]. CT Theory and Applications, 2024, 33(2): 135-148. DOI: 10.15953/j.ctta.2023.223
[5]	ZHANG Junhua, HU Yifu, YU Zhengjun, REN Ruijun, LIU Xuanliang. Research on the Edge Feature Enhancement of Fluvial Reservoirs Based on Image Processing[J]. CT Theory and Applications, 2023, 32(4): 450-460. DOI: 10.15953/j.ctta.2022.174
[6]	ZHU Yuanzheng, LV Qiwen, GUAN Yu, LIU Qiegen. Low-dose CT Reconstruction Based on Deep Energy Models[J]. CT Theory and Applications, 2022, 31(6): 709-720. DOI: 10.15953/j.ctta.2021.077
[7]	WU Wen-shu, WANG Zhe, SHI Yi-fan, ZHAO Zhan-shan, WEI Cun-feng, WEI Long. Noise Reduction for Low-dose CT Sinogram Based on PDE[J]. CT Theory and Applications, 2017, 26(2): 177-188. DOI: 10.15953/j.1004-4140.2017.26.02.06
[8]	LIU Wen-lei, RONG Jun-yan, GAO Peng, LIAO Qi-mei, LU Hong-bing. Sparse-view Reconstruction from Restored Low-dose CT Projections[J]. CT Theory and Applications, 2013, 22(3): 421-428.
[9]	ZHANG Xi-le, HUANG Jing, LIU Nan, LU Li-jun, MA Jian-hua, CHEN Wu-fan. Wavelet-Transform Based Low-Dose CT Projection Filtering[J]. CT Theory and Applications, 2011, 20(2): 163-171.
[10]	FENG Xia, HAO Zhen-ping, FENG Yan, SHI Chao. Edge Detection in the Tire X-ray Detection[J]. CT Theory and Applications, 2010, 19(3): 61-66.

Cited By

Cited by

Periodical cited type(5)

1.	上官宏，任慧莹，张雄，韩兴隆，桂志国，王燕玲. 基于双编码器双解码器GAN的低剂量CT降噪模型. 计算机应用. 2025(02): 624-632 .
2.	陈辉艳，王映飞，梁莎，许大幸. 孕妇儿童肺部疾病多模态磁共振扫描应用研究进展. 实用医学影像杂志. 2023(05): 373-375 .
3.	朱元正，吕启闻，官瑜，刘且根. 基于深度能量模型的低剂量CT重建. CT理论与应用研究. 2022(06): 709-720 . 本站查看
4.	栗乔新，金科，庞志峰. 一种去除CT图像中同心椭圆伪影的变分模型. CT理论与应用研究. 2022(06): 773-781 . 本站查看
5.	吴凡，刘进，张意，陈阳，陆志凯. 面向CT成像的深度重建算法研究进展. 中国体视学与图像分析. 2022(04): 387-404 .

Other cited types(12)

Get Citation

PDF

XML

Article views (216) PDF downloads (29) Cited by(17)

Few-shot Periodic Video Image Segmentation Based on LSTM and Cross-attention Mechanism

Abstract

References

Related Articles

Cited by

Periodical cited type(5)

Other cited types(12)

Catalog

Related

Few-shot Periodic Video Image Segmentation Based on LSTM and Cross-attention Mechanism

Abstract

References

Related Articles

Cited by

Periodical cited type(5)

Other cited types(12)

Catalog

Related

Export File

Citation

Format

Content