高级搜索

基于机器学习构建非小细胞肺癌恶性胸腔积液的诊断和预后模型

Development of Machine Learning-Driven Diagnostic and Prognostic Models for Non-Small Cell Lung Cancer-Associated Malignant Pleural Effusion

  • 摘要:
    目的 基于机器学习构建非M1b期(AJCC第7版)非小细胞肺癌(NSCLC)患者恶性胸腔积液(MPE)的诊断和预后模型。
    方法 回顾性分析监测、流行病学和最终结果(SEER)数据库中2010—2015年确诊的NSCLC患者信息,排除M1b期患者,收集2组数据:数据1(非M1b期的NSCLC患者,n=47392)用于构建MPE诊断模型;数据2(合并MPE的M1a期NSCLC患者,n=2422)用于构建预后模型。采用最小绝对收缩与选择算子(LASSO)回归筛选特征变量,训练集∶验证集例数比为7∶3。通过8种机器学习算法分别建立模型,评估指标包括准确率、精确率、召回率、F1值、ROC曲线下面积(AUC)、决策曲线(DCA)、校准曲线及精确率-召回率曲线(PR),以ROC-AUC值作为主要评估指标。
    结果 在非M1b期NSCLC患者中MPE的发生率为5.1%,合并MPE的患者1年生存率为32.1%。LASSO回归筛选出9个诊断相关变量和12个预后相关变量。8种机器学习算法构建的模型AUC值均超过0.70,诊断模型中随机森林模型性能最佳(训练集AUC=0.908,验证集AUC=0.897),预后模型中XGBoost模型表现出最佳性能(训练集AUC=0.905,验证集AUC=0.875),其他评价指标均结果良好且分布均衡。SHAP特征重要性分析显示,肿瘤大小、淋巴结转移和组织学类型是MPE发生的重要影响因素,而化疗干预是最显著的预后影响因素。
    结论 本研究构建的随机森林诊断模型可以有效预测非M1b期NSCLC患者MPE的发生风险,XGBoost预后模型可预测合并MPE的M1a期NSCLC患者的预后。

     

    Abstract:
    Objective To construct a diagnostic and prognostic model for malignant pleural effusion (MPE) in patients with non-M1b stage (AJCC 7th edition) non-small cell lung cancer (NSCLC) by machine learning.
    Methods Retrospective analysis was conducted on patients diagnosed with NSCLC in the Surveillance, Epidemiology, and End Results database from 2010 to 2015, excluding those in the M1b stage. Two sets of data were collected: data 1 (patients with non-M1b stage NSCLC, n=47 392) was used to construct the MPE diagnostic model; and data 2 (patients with M1a stage NSCLC and MPE, n=2 422) was used to construct a prognostic model. The Least Absolute Shrinkage and Selection Operator (LASSO) regression was used to screen feature variables, with a training set and validation set ratio of 7:3. Models were built using eight machine learning algorithms, with evaluation metrics including accuracy, precision, recall, F1 score, area under the ROC curve (AUC), decision curve, calibration curve, and precision recall curve (PR), with ROC-AUC as the main evaluation metric.
    Results The incidence of MPE in patients with non-M1b stage NSCLC was 5.12%, and the 1-year survival rate of patients with MPE was 32.5%. LASSO regression identified nine diagnostic-related variables and 12 prognostic-related variables. The AUC values of the models constructed by eight machine learning algorithms all exceeded 0.70. The random forest model performed the best in the diagnostic model (training set AUC=0.908, validation set AUC=0.897), and the XGBoost model showed the best performance in the prognostic model (training set AUC=0.905, validation set AUC=0.875). Other evaluation indicators showed good results and balanced distribution. SHAP feature importance analysis showed that tumor size, lymph node metastasis, and histological type were important influencing factors for the occurrence of MPE, and chemotherapy intervention was the most remarkably prognostic factor.
    Conclusion The random forest diagnostic model constructed in this study can effectively predict the risk of MPE in patients with non-M1b stage NSCLC, and the XGBoost prognostic model can predict the prognosis of M1a-stage NSCLC patients with concurrent MPE.

     

/

返回文章
返回