Abstract:
Objective To construct a risk prediction model for liver cancer in patients with chronic hepatitis C based on seven different machine learning algorithms and screen out the optimal model. Methods
A total of 236 patients with chronic hepatitis C diagnosed in the Union Hospital of Fujian Medical University from January 2016 to December 2023 were selected as the research subjects, and the patients were divided into a case group and a control group according to whether liver cancer occurred. Prediction models were constructed based on seven machine learning algorithms, including Classification and Regression Tree (CART), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), extreme gradient boosting (Xgboost), Logistic Regression (LR), K-near neighbor (KNN), and Support Vector Machine (SVM). The
Shapley Additive Explanations (SHAP) algorithm was used to interpret the best prediction model. Results Among the seven models, the XGBoost model had the best comprehensive prediction performance (accuracy 0.933, sensitivity 0.775, specificity 0.960, area under the ROC curve 0.956, F1 score 0.764). The SHAP algorithm suggested that AFP, age, AST, diabetes, BMI, PLT, ALT, liver cysts, FIB-4, and gender contributed more to the model decision, suggesting that these factors are risk factors for liver cancer in patients with chronic hepatitis C. Conclusion This study constructed an interpretable machine learning model based on the XGBoost algorithm, which has a good reference value for individualized monitoring of liver cancer in patients with chronic hepatitis C.