April 2026
Clostridioides difficile infection (CDI) presents a significant challenge in patients with inflammatory bowel disease (IBD), with high recurrence rates and complications. Predicting recurrent CDI (rCDI) in patients with IBD is crucial for implementing targeted interventions to improve patient outcomes. This study aimed to develop and validate a predictive model (RecurCDI-IBD) using supervised machine learning to identify patients with IBD at high risk of developing rCDI.
Data were collected from adult patients with IBD diagnosed with CDI between 2013 and 2021. Inclusion criteria included adult patients with a confirmed diagnosis of CDI and a history of IBD. The Gradient Boosting Machine learning model (XGBoost) was used to train a binary classification model. Feature engineering included demographic data (age and sex), clinical data (IBD subtype, medication use, and comorbidities), and laboratory data. The primary outcome was the occurrence of rCDI within 60 days of the initial CDI episode.
The RecurCDI-IBD model achieved an accuracy of 80.05% and an Area Under the Curve of 0.88 for predicting rCDI. Key predictive features included IBD subtype, sex, specific medications (such as steroids and anti-TNF agents), and comorbidities (such as chronic pulmonary and renal disease).
The RecurCDI-IBD model demonstrates good discriminatory ability with balanced precision and recall in identifying patients with IBD at higher risk for rCDI. These findings highlight the potential of data-driven approaches to support clinical risk assessment. Further studies incorporating larger and more diverse cohorts and prospective external validation are needed to confirm generalizability and optimize clinical applicability.