Abstract:
Objective To construct a risk prediction model for liver cancer in patients with chronic hepatitis C based on seven different machine learning algorithms and select the optimal model.
Methods A total of 236 patients with chronic hepatitis C were selected as the research subjects. Patients were divided into a case group and a control group according to whether liver cancer occurs. Prediction models were constructed based on seven machine learning algorithms including classification and regression tree, random forest, gradient boosting decision tree, extreme gradient boosting (XGBoost), logistic regression, K-near neighbor, and support vector machine. The Shapley additive explanations (SHAP) algorithm was used to interpret the best prediction model.
Results Among the seven models, the XGBoost model had the best comprehensive prediction performance (accuracy of 0.933, sensitivity of 0.775, specificity of 0.960, area under the ROC curve of 0.956, F1 score of 0.764). The SHAP algorithm suggested that AFP, age, AST, diabetes, BMI, PLT, ALT, liver cysts, FIB-4, and gender contributed to the model decision and are the risk factors for liver cancer in patients with chronic hepatitis C.
Conclusion This study develops an interpretable machine learning model based on the XGBoost algorithm, which has a good reference value for individualized monitoring of liver cancer in patients with chronic hepatitis C.