高级搜索

基于机器学习预测模型探索慢性丙型肝炎患者发生肝癌的风险因素

Exploring risk factors for liver cancer in patients with chronic hepatitis C based on machine learning prediction models

  • 摘要: 目的 基于七种不同的机器学习算法构建慢性丙型肝炎患者发生肝癌的风险预测模型,并筛选出最优模型。方法 选择2016年1月至2023年12月在福建医科大学附属协和医院确诊的236例慢性丙型肝炎患者为研究对象,以是否发生肝癌将患者分为病例组和对照组。基于决策树(Classification and Regression Tree, CART)、随机森林(Random Forest, RF)、梯度提升决策树(Gradient Boosting Decision Tree , GBDT)、极端梯度提升(extreme gradient boosting,Xgboost)、逻辑回归(Logistic Regression, LR)、K-近邻(K-near neighbor, KNN)、支持向量机(Support Vector Machine,SVM)七种机器学习算法分别构建预测模型,针对最佳预测模型采用Shapley Additive Explanations(SHAP)算法进行模型解释。结果 七种模型中,XGBoost模型的综合预测性能最好,(准确率0.933、敏感度0.775、特异度0.960、ROC曲线下面积0.956、F1分数0.764)。SHAP算法提示AFP、年龄、AST、糖尿病、BMI、PLT、ALT、肝囊肿、FIB-4、性别对模型决策贡献度较大,提示这些因素是慢性丙型肝炎患者发生肝癌的风险因素。结论 本研究构建了一种可解释的基于XGBoost算法的机器学习模型,在慢性丙型肝炎患者群体中进行肝癌个体化监测具有良好的参考价值。

     

    Abstract:   Objective To construct a risk prediction model for liver cancer in patients with chronic hepatitis C based on seven different machine learning algorithms and screen out the optimal model. Methods
      A total of 236 patients with chronic hepatitis C diagnosed in the Union Hospital of Fujian Medical University from January 2016 to December 2023 were selected as the research subjects, and the patients were divided into a case group and a control group according to whether liver cancer occurred. Prediction models were constructed based on seven machine learning algorithms, including Classification and Regression Tree (CART), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), extreme gradient boosting (Xgboost), Logistic Regression (LR), K-near neighbor (KNN), and Support Vector Machine (SVM). The
      Shapley Additive Explanations (SHAP) algorithm was used to interpret the best prediction model. Results Among the seven models, the XGBoost model had the best comprehensive prediction performance (accuracy 0.933, sensitivity 0.775, specificity 0.960, area under the ROC curve 0.956, F1 score 0.764). The SHAP algorithm suggested that AFP, age, AST, diabetes, BMI, PLT, ALT, liver cysts, FIB-4, and gender contributed more to the model decision, suggesting that these factors are risk factors for liver cancer in patients with chronic hepatitis C. Conclusion This study constructed an interpretable machine learning model based on the XGBoost algorithm, which has a good reference value for individualized monitoring of liver cancer in patients with chronic hepatitis C.

     

/

返回文章
返回