ISSN 1004-4140
CN 11-3017/P

影像组学对非小细胞肺癌患者分期的机器学习预测模型研究

周洁, 郑燕婷, 江舒琪, 安杰, 邱士军, 陈淮

周洁,郑燕婷,江舒琪,等. 影像组学对非小细胞肺癌患者分期的机器学习预测模型研究[J]. CT理论与应用研究(中英文),xxxx,x(x): 1-9. DOI: 10.15953/j.ctta.2024.186.
引用本文: 周洁,郑燕婷,江舒琪,等. 影像组学对非小细胞肺癌患者分期的机器学习预测模型研究[J]. CT理论与应用研究(中英文),xxxx,x(x): 1-9. DOI: 10.15953/j.ctta.2024.186.
ZHOU J, ZHENG Y T, JIANG S Q, et al. Machine Learning Prediction Models for Staging of Non-small Cell Lung Cancer Patients Using Radiomics[J]. CT Theory and Applications, xxxx, x(x): 1-9. DOI: 10.15953/j.ctta.2024.186. (in Chinese).
Citation: ZHOU J, ZHENG Y T, JIANG S Q, et al. Machine Learning Prediction Models for Staging of Non-small Cell Lung Cancer Patients Using Radiomics[J]. CT Theory and Applications, xxxx, x(x): 1-9. DOI: 10.15953/j.ctta.2024.186. (in Chinese).

影像组学对非小细胞肺癌患者分期的机器学习预测模型研究

基金项目: 广东省基础与应用基础研究基金(放射组学对肺癌患者纵隔淋巴结的良恶性鉴别及预后评估(2022A1515011028))。
详细信息
    作者简介:

    周洁,女,广州中医药大学第一附属医院副主任医师,主要从事胸部疾病医学影像研究,E-mail:zhoujie1989@gzucm.edu.cn

    通讯作者:

    陈淮✉,男,广州医科大学附属第二医院教授、博士生导师,主要从事胸部疾病医学影像研究,E-mail:chenhuai1977@163.com

  • 中图分类号: P 631.3

Machine Learning Prediction Models for Staging of Non-small Cell Lung Cancer Patients Using Radiomics

  • 摘要:

    目的:探讨CT影像组学机器学习分类模型对非小细胞肺癌患者分期进行预测的价值。方法:从癌症影像数据库(TCIA)下载lung1数据集,选用符合条件的291例非小细胞肺癌患者的数据并将其分为两组,1组(Ⅰ、Ⅱ期)和2组(Ⅲ期、Ⅳ期)。从每个肿瘤病灶分别提取1037个影像组学特征,运用t检验、最小绝对收缩和选择算子(LASSO)算法进行特征筛选。病灶的CT形态学特征通过t检验和卡方检验进行筛选。采用Logistic回归、随机森林、高斯朴素贝叶斯、支持向量机、AdaBoost等5种机器学习分类器进行模型训练并建立预测模型,用受试者工作特征曲线(ROC)来评价这些预测模型的效能并选出最优模型。最后使用在本院收集的91例患者数据进行外部验证。结果:本研究在特征筛选后,得到13个具有较高诊断价值的影像组学特征用于后续建立NSCLC患者分期预测模型。在5种机器学习分类模型中,随机森林分类预测模型是最佳模型,运用此模型的验证集AUC值最高,为0.740。经过外部验证,该模型在训练集AUC值为1.000,敏感度为1.000,特异度为1.000;测试集AUC值为0.698,敏感度为0.873,特异度为0.500。病例的CT形态学特征里面,除了病灶大小之外,其它特征在不同分期患者的差异无统计学意义。结论:CT影像组学机器学习分类模型对NSCLC患者的分期有一定的预测能力。

    Abstract:

    Objective: To explore the value of computed tomography (CT) radiomics machine learning classification models for predicting the staging of non-small cell lung cancer (NSCLC). Methods: We downloaded the Lung 1 dataset from the Cancer Imaging Database (TCIA), selected 291 eligible cell lung cancer patients, and divided them into two groups: Group 1 (Stage I and II) and Group 2 (Stage III and IV). We extracted 1037 radiomics features from each lesion and used the t-test and least absolute shrinkage and selection operator (LASSO) algorithm for feature selection. The CT signs of the lesions were screened using t-tests and chi-squared tests. The model was trained, and a prediction model was established using five machine learning classifiers: Logistic regression, random forest, Gaussian NB, support vector machine, and AdaBoost. The performance of the five prediction models was evaluated using receiver operating characteristic (ROC) curves, and the optimal model was selected. Finally, external validation was conducted using data acquired from 91 patients at our hospital. Results: After feature screening, 13 radiomics features with high diagnostic value were obtained for the subsequent establishment of an NSCLC patient staging prediction model. Among the five machine-learning classification models, the Random Forest classification prediction model was the best. The validation set AUC value using this model was the highest at 0.740. After external verification, the model exhibited an AUC value of 1.000, sensitivity of 1.000, and specificity of 1.000 in the training set. The AUC value of the test set was 0.698, with a sensitivity of 0.873 and a specificity of 0.500. In the CT morphological features of the case, except for the size of the lesion, there were no statistically significant differences in other features among the patients at different stages. Conclusion: The CT radiomics machine learning classification model can predict the staging of patients with NSCLC.

  • 图  1   病灶ROI的勾划以及三维重建

    Figure  1.   Delineation of lesion ROI and 3D reconstruction

    图  2   模型建立、验证流程图

    Figure  2.   Model Establishment and Validation Process Diagram

    图  3   LASSO模型通过交叉验证,选取误差最小的λ值。虚线代表最佳λ取值对应的log(λ)值

    Figure  3.   LASSO model selected with the smallest cross validation error via cross validation λ Value. The dashed line represents the log (λ) value corresponding to the optimal value of λ.

    图  5   LASSO筛选保留的13个特征参数

    Figure  5.   Thirteen feature parameters retained by LASSO screening

    图  4   使用十折交叉验证方法筛选特征的特征系数收敛图,每条曲线代表自变量系数随λ参数的变化

    Figure  4.   Convergence of feature coefficients for feature selection using a 10-fold cross validation method, where each curve represents the variation of independent variable coefficients with respect to λ changes in parameters

    图  6   训练集评估5种机器学习分类模型预测效能的ROC曲线图

    Figure  6.   ROC curve for evaluating the predictive performance of five machine learning classification models on the training set.

    图  7   验证集评估5种机器学习分类模型预测效能的ROC曲线图

    Figure  7.   ROC curve for evaluating the predictive performance of five machine learning classification models on the validation set.

    图  8   使用测试集数据进行外部验证,评估随机森林分类模型预测效能的ROC曲线图

    Figure  8.   ROC curve for evaluating the predictive performance of a Random Forest Classification model using test set data for external validation

    表  1   训练集和测试集病例的基本临床信息与CT形态学特征(n,%)

    Table  1   Basic clinical information and CT morphological features of training and testing set cases (n, %)

    参数 训练集 测试集
    1组(n=103) 2组(n=188) $t/\chi^2 $ P 1组(n=20) 2组(n=71) $t/\chi^2 $ P
    性别     0.266 0.606     0.268 0.605
     男 71(68.932) 124(65.957) 15(75.000) 49(69.014)
     女 32(31.068) 64(34.043) 5(25.000) 22(30.986)
    年龄/岁 72.649±8.768 66.877±9.698 4.795 <0.001 61.621±9.006 63.662±9.601 0.851 0.401
    病灶大小/mm 37.896±20.348 45.393±21.492 −2.889 0.004 39.036±19.868 50.345±21.325 −2.102 0.038
    病理类型 25.235 <0.001     5.465 0.141
      腺癌 16(15.534) 19(10.106)   1(5.000) 9(12.676)    
      鳞癌 32(31.068) 70(37.234)   6(30.000) 18(25.352)    
      大细胞癌 17(16.505) 70(37.234) 2(10.000) 21(29.577)
      未分类 38(36.893) 29(15.426) 11(55.000) 23(32.394)
    深分叶征 0.02 0.888 0.317 0.573
      无 55(53.398) 102(54.255) 11(55.000) 44(61.972)
      有 48(46.602) 86(45.745) 9(45.000) 27(38.028)
    胸膜凹陷征 0.353 0.552 2.887 0.089
      无 56(54.369) 109(57.979) 11(55.000) 53(74.648)
      有 47(45.631) 79(42.021) 9(45.000) 18(25.352)
    毛刺征 0.033 0.856 0.846 0.358
      无 52(50.485) 97(51.596) 11(55.000) 47(66.197)
      有 51(49.515) 91(48.404) 9(45.000) 24(33.803)
    棘状突起 0.238 0.626 1.141 0.285
      无 38(36.893) 64(34.043) 8(40.000) 38(53.521)
      有 65(63.107) 124(65.957) 12(60.000) 33(46.479)
    支气管气相 0.042 0.837 3.613 0.057
      无 96(93.204) 174(92.553) 18(90.000) 70(98.592)
      有 7(6.796) 14(7.447) 2(10.000) 1(1.408)
    空洞 0.421 0.516 1.501 0.221
      无 91(88.350) 161(85.638) 19(95.000) 60(84.507)
      有 12(11.650) 27(14.362) 1(5.000) 11(15.493)
    空泡征 0.004 0.947 3.613 0.057
      无 94(91.262) 172(91.489) 18(90.000) 70(98.592)
      有 9(8.738) 16(8.511) 2(10.000) 1(1.408)
    钙化 0.062 0.803 0.936 0.333
      无 93(90.291) 168(89.362) 19(95.000) 70(98.592)
      有 10(9.709) 20(10.638)     1(5.000) 1(1.408)    
    下载: 导出CSV

    表  2   影像组学特征筛选结果

    Table  2   Imaging omics feature screening results

    影像组学特征权重系数
    original_shape_Flatness0.01491
    original_shape_Sphericity0.04153
    wavelet-LLH_firstorder_Mean0.026735
    wavelet-LLH_glcm_Imc10.022442
    wavelet-LHH_glcm_ClusterShade0.04337
    wavelet-LHH_glszm_SmallAreaLowGrayLevelEmphasis0.01344
    wavelet-HLH_firstorder_Skewness0.01968
    wavelet-HHL_gldm_DependenceVariance0.00985
    wavelet-HHH_firstorder_Skewness0.003253
    wavelet-HHH_glcm_Imc20.05804
    wavelet-LLL_gldm_DependenceEntropy0.03631
    log-sigma-2-0-mm-3D_firstorder_Maximum0.057039
    log-sigma-3-0-mm-3D_firstorder_Minimum0.006818
    下载: 导出CSV

    表  3   5种机器学习分类模型-训练集结果

    Table  3   5 Machine learning classification models - training set results

    分类模型 AUC(95%CI) cutoff 准确率 灵敏度 特异度 阳性预测值 阴性预测值
    logistic 0.712 (0.639-0.786) 0.654 0.668 0.643 0.722 0.805 0.534
    Random Forest 1.000 (NaN-NaN) 0.620 0.988 0.998 1.000 1.000 0.968
    GNB 0.778 (0.711-0.844) 0.692 0.738 0.741 0.744 0.836 0.616
    SVM 0.657 (0.577-0.738) 0.665 0.621 0.588 0.694 0.785 0.477
    AdaBoost 0.997 (0.994-1.000) 0.503 0.973 0.973 0.989 0.994 0.940
    注:Logistic为逻辑回归;Random Forest为随机森林;GNB为高斯朴素贝叶斯;SVM为支持向量机;AdaBoost为自适应提升算法。
    下载: 导出CSV

    表  4   5种机器学习分类模型-内部验证集结果

    Table  4   5 machine learning classification models - internal validation set results

    分类模型AUC(95%CI)cutoff准确率灵敏度特异度阳性预测值阴性预测值
    logistic0.697 (0.579-0.816)0.6540.6660.6840.7380.8090.513
    Random Forest0.740 (0.634-0.845)0.6200.6480.6660.7400.7790.519
    GNB0.668 (0.544-0.791)0.6920.6320.6840.6610.7740.461
    SVM0.629 (0.505-0.753)0.6650.6000.6780.6120.7520.470
    AdaBoost0.616 (0.488-0.744)0.5030.6050.6920.5670.7060.458
    下载: 导出CSV

    表  5   随机森林预测模型的外部验证结果

    Table  5   External validation results of the Random Forest prediction model

    组别AUC(95%CI)cutoff准确率敏感度特异度阳性预测值阴性预测值
    训练集1.000(NaN-NaN)0.6000.9911.0001.0001.0000.974
    测试集0.698(0.556-0.840)0.6000.5930.8730.5000.8700.311
    下载: 导出CSV
  • [1]

    SUNG H, FERLAY J, SIEGEL R L, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA: A Cancer Journal for Clinicians, 2021, 71(3): 209‐249.

    [2]

    BRAY F, FERLAY J, SOERJOMATARAM I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA: A Cancer Journal for Clinicians, 2018, 68(6): 394-424. DOI: 10.3322/caac.21492.

    [3]

    GUO Y, SONG Q, JIANG M, et al. Histological subtypes classification of lung cancers on CT images using 3D deep learning and radiomics[J]. Academic Radiology, 2021, 28(9): e258-e266. DOI: 10.1016/j.acra.2020.06.010.

    [4]

    ALLEMANI C, MATSUDA T, DI CARLO V, et al. Global surveillance of trends in cancer survival 2000-14 (CONCORD-3): analysis of individual records for 37 513 025 patients diagnosed with one of 18 cancers from 322 population‐based registries in 71 countries[J]. Lancet, 2018, 391(10125): 1023-1075. DOI: 10.1016/S0140-6736(17)33326-3.

    [5] 卢孔尧, 黄钢, 左艳. 非小细胞肺癌淋巴结转移预测模型研究[J]. 中国医学物理学杂志, 2022, 39(2): 182-187.

    LU K R, HUANG G, ZUO Y. Prediction model for lymph node metastasis in non-small cell lung cancer[J]. Chinese Journal of Medical Physics, 2022, 39(2): 182-187. (in Chinese).

    [6] 周洁, 郑燕婷, 江舒琪, 等. CT影像组学联合形态学特征模型评估非小细胞肺癌患者预后生存期的价值[J]. 中国医学物理学杂志, 2024, 41(1): 18-26.

    ZHOU J, ZHENG Y T, JIANG S Q, et al. Value of CT radiomics combined with morphological features in predicting the prognosis of patients with non-small cell lung cancer[J]. Chinese Journal of Medical Physics, 2024, 41(1): 18-26. (in Chinese).

    [7]

    COROLLER T P, AGRAWAL V, NARAYAN V, et al. Radiomic phenotype features predict pathological response in non-small cell lung cancer[J]. Radiotherapy and Oncology, 2016, 119(3): 480-486. DOI: 10.1016/j.radonc.2016.04.004.

    [8] 许昌, 刘玉霞, 郭兰田, 等. 基于薄层CT的影像组学和形态学特征联合模型在预测磨玻璃样肺腺癌浸润程度的价值[J]. 临床放射学杂志, 2022, 41(1): 64-69.

    XU C, LIU Y X, GUO L T, et al. Radiomics features combined with morphological signs based on thin-slice CT for predicting the invasive of pulmonary adenocarcinoma appearing as ground-glass nodules[J]. Journal of Clinical Radiology, 2022, 41(1): 64-69. (in Chinese).

    [9] 郑慧, 李建玉, 王珊, 等. 基于肺磨玻璃结节CT征象的诊断模型列线图评估肺癌浸润性[J]. 放射学实践, 2021, 36(4): 470-474.

    ZHENG H, LI J Y, WANG S, et al. Evaluation on the invasion of lung cancer by diagnostic model nomogram based on the CT characteristics of pulmonary ground glass nodules[J]. Radiologic Practice, 2021, 36(4): 470-474. (in Chinese).

    [10] 代平, 何其舟, 王思凯, 等. CT定量分析预测肺部肿瘤性磨玻璃结节病理侵袭性的价值[J]. 放射学实践, 2019, 34(10): 1108-1112.

    DIA P, HE Q Z, WANG S K, et al. Quantitative CT analysis of pulmonary ground-glass nodule predicts histological invasiveness[J]. Radiologic Practice, 2019, 34(10): 1108-1112. (in Chinese).

    [11] 贾守勤, 张旭, 王武章, 等. 肺结核合并肺癌的临床特点及MSCT影像学分析[J]. 医学影像学杂志, 2021, 31(10): 1682-1685.

    JIA S Q, ZHANG X, WANG W Z, et al. Clinical characteristis of the pulmonary tuberculosis with lung cancer and analysis of MSCT imaging[J]. Journal of Medical Imaging, 2021, 31(10): 1682-1685. (in Chinese).

    [12]

    MENG X, XIA W, XIE P, et al. Preoperative radiomic signature based on multiparametric magnetic resonance imaging for noninvasive evaluation of biological characteristics in rectal cancer[J]. European Radiology, 2019, 29: 3200-3209. DOI: 10.1007/s00330-018-5763-x. (in Chinese).

    [13] 中华医学会肿瘤学分会, 中华医学杂志社. 中华医学会肺癌临床诊疗指南(2023版)[J]. 中华医学杂志, 2023, 103(27): 2037-2074.

    Oncology Society of Chinese Medical Association, Chinese Medical Association Publishing House. Chinese Medical Association guideline for clinical diagnosis and treatment of lung cancer (2023 edition)[J]. National Medical Journal of China, 2023, 103(27): 2037-2074. (in Chinese).

    [14] 崔婷婷. 多层螺旋CT在肺癌临床诊断中的应用价值及征象特征研究[J]. 影像研究与医学应用, 2021, 5(20): 8-9.

    CUI T T. Study on the application value of multilayered spiral CT in the clinical diagnosis of lung cancer[J]. Journal of Imaging Research and Medical, 2021, 5(20): 8-9. (in Chinese).

    [15] 钱琴, 陆云霞, 乔正博, 等. Cys-C、Hcy、CYFRA21-1在肺癌分期及预后中的价值[J]. 分子诊断与治疗杂志, 2023, 15(8): 1397-1401.

    QIAN Q, LU Y X, QIAO Z B, et al. Value of Cys-C, Hcy and CYFRA21-1 in staging and prognosis of lung cancer[J]. Journal of Molecular Diagnostics and Therapy, 2023, 15(8): 1397-1401. (in Chinese).

    [16] 陈少武. 能谱CT定量参数结合血清NSE和HOMA-IR与肺癌分期的相关性[J]. 中国卫生工程学, 2023, 22(2): 251-252,255.

    CHEN S W. The correlation between quantitative parameters of spectral CT combined with serum NSE and HOMA-IR and lung cancer staging[J]. Chinese Journal of Public Health Engineering, 2023, 22(2): 251-252,255. (in Chinese).

    [17] 陈宝华, 段强军. 血清CEA、NSE联合影像学特征在肺癌分期及预后中的价值[J]. 分子诊断与治疗杂志, 2022, 14(9): 1615-1619.

    CHEN B H, DUAN Q J. The value of serum CEA and NSE combined with imaging features in lung cancer staging and prognosis[J]. Journal of Molecular Diagnostics and Therapy, 2022, 14(9): 1615-1619. (in Chinese).

    [18] 欧阳锦, 罗亭, 余石群, 等. 基于GEO数据库结合CT影像预测肺癌临床分期的分子标志物及其诊断预测模型的建立[J]. 南昌大学学报(医学版), 2021, 61(5): 1-7.

    OUYANG J, LUO T, YU S Q, et al. Prediction of molecular markers for lung cancer staging based on GEO database combined with CT images and establishment of diagnosis and prediction model[J]. Journal of Nanchang University (Medical Sciences), 2021, 61(5): 1-7. (in Chinese).

    [19]

    BINCZYK F, PRAZUCH W, BOZEK P, et al. Radiomics and artificial intelligence in lung cancer screening[J]. Translational Lung Cancer Research, 2021, 10(2): 1186-1199. DOI: 10.21037/tlcr-20-708.

    [20]

    Imbriaco M, Cuocolo R. Does Texture Analysis of MR Images of Breast Tumors Help Predict Response to Treatment?[J]. Radiology, 2018, 286(2): 421-423. DOI: 10.1148/radiol.2017172454.

    [21] 陈晓, 杨斌. 影像组学技术在非小细胞肺癌预后中的应用研究[J]. CT理论与应用研究(中英文), 2024, 33(3): 385-390.

    CHEN X, YANG B. The application of radiomics in the prognosis of non-small cell lung cancer[J]. Computerized Tomography Theory and Applications, 2024, 33(3): 385-390. (in Chinese).

    [22]

    FENG B, CHEN X, CHEN Y, et al. Solitary solid pulmonary nodules: A CT-based deep learning nomogram helps differentiate tuberculosis granulomas from lung adenocarcinomas[J]. European Radiology, 2020, 30: 6497-6507. DOI: 10.1007/s00330-020-07024-z.

    [23]

    XIA X, GONG J, HAO W, et al. Comparison and fusion of deep learn-ing and radiomics features of ground-glass nodules to predict the invasiveness risk of stage-i lung adenocarcinomas in CT scan[J]. Frontiers in Oncology, 2020, 10: 418. DOI: 10.3389/fonc.2020.00418.

    [24]

    LUBNER M G, SMITH A D, SANDRASEGARAN K, et al. CT texture analysis: Definitions, applications, biologic correlates, and challenges[J]. RadioGraphics, 2017, 37(5): 1483-1503. DOI: 10.1148/rg.2017170056.

    [25] 周洁, 郑燕婷, 江舒琪, 等. 非小细胞肺癌患者预后生存时间范围的机器学习预测模型研究[J]. 放射学实践, 2024, 39(5): 622-628.

    ZHOU J, ZHENG Y T, JIANG S Q, et al. Study on machine learning predictive model for prognostic survival time range of non-small cell lung cancer patients[J]. Radiologic Practice, 2024, 39(5): 622-628. (in Chinese).

    [26]

    SU X, XU Y, TAN Z, et al. Prediction for cardiovascular diseases based on laboratory data: An analysis of random forest model[J]. Journal of Clinical Laboratory Analysis, 2020, 34(9): e23421. DOI: 10.1002/jcla.23421.

图(8)  /  表(5)
计量
  • 文章访问数:  115
  • HTML全文浏览量:  27
  • PDF下载量:  19
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-08-25
  • 修回日期:  2024-10-19
  • 录用日期:  2024-10-24
  • 网络出版日期:  2024-11-11

目录

    /

    返回文章
    返回
    x 关闭 永久关闭