Gastroenterology
META ANALYSIS
● Meta-analysis
Meta-analysis finds predictive models for post-ERCP pancreatitis show moderate accuracy
Frontiers in Medicine
Published April 1, 2026
DOI ↗
This systematic review and meta-analysis evaluated the accuracy, reliability, and risk of bias of predictive models for post-ERCP pancreatitis (PEP). The analysis included 23 studies (21 for model development, 2 for external validation) involving patients undergoing ERCP. The primary outcome was model performance, assessed through metrics including area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and calibration.
The pooled AUC for the two externally validated models was 0.79 (95% CI: 0.75–0.83). Machine learning models demonstrated a higher mean AUC of 0.84 compared to traditional logistic regression models, which had a mean AUC of 0.76. Common predictive factors identified across models included difficult cannulation, female sex, pancreatic duct dilation, and history of pancreatitis. The mean events per variable (EPV) across studies was 10.2, with a wide range from 2.2 to 22.4.
Safety and tolerability data were not reported. Key limitations include significant variability in model performance, lack of external validation for most models, and significant bias identified in many of the included studies. The practice relevance is that while predictive models show potential for improving patient risk stratification, their current clinical applicability is limited. These models require further external validation and refinement before they can be reliably implemented in clinical practice.
Researchers reviewed 23 studies that developed computer models to predict a patient's risk of developing pancreatitis after an ERCP procedure. ERCP is a common test for digestive problems. The models used patient information like age, medical history, and details about the procedure to estimate risk.
The review found that the models, when tested on new groups of patients, had moderate accuracy. Machine learning models were slightly more accurate than traditional statistical models. Common factors the models used to predict higher risk included being female, having a history of pancreatitis, and having a difficult procedure.
It's important to know this research is about the models themselves, not a new treatment. The studies had significant limitations, including a lack of real-world testing for many models and potential bias. This means doctors cannot confidently use these models in clinics yet. The findings suggest these tools have potential, but they need much more development and validation to be reliable for patient care.
What this means for you: Computer models for predicting pancreatitis risk after ERCP show promise but are not ready for routine clinical use.
View Original Abstract ↓
Background and aimsPost-ERCP pancreatitis (PEP) is the most common complication following ERCP, leading to significant clinical and economic consequences. Predictive models for PEP can help identify high-risk patients and guide preventive strategies. However, the performance of these models varies, and a comprehensive evaluation is lacking. This study aims to assess the accuracy, reliability, and risk of bias in existing predictive models for PEP.MethodsA comprehensive search was conducted across five databases (PubMed, Embase, Web of Science, Cochrane Library, and CNKI) for studies published until January 2025. Studies that developed or validated predictive models for PEP were included. Models with external validation sets were included in a meta-analysis. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and calibration. A random-effects meta-analysis was performed, with heterogeneity assessed using I² statistics. Data extraction and risk of bias were conducted using a standardized template combining the CHARMS and PROBAST tools.ResultsTwenty-three studies (21 model development studies and 2 external validation studies) were included, presenting 21 predictive models for PEP. Nine models incorporated external validation, with one study recalibrating an existing model and another externally validating two prior models. The mean events per variable (EPV) across studies was 10.2 (2.2 to 22.4). The pooled AUC for externally validated models was 0.79 (95% CI: 0.75–0.83). Machine learning models demonstrated higher AUC (0.84) than traditional logistic regression models (0.76). Common predictive factors included difficult cannulation, female sex, pancreatic duct dilation, and a history of pancreatitis.ConclusionsPredictive models for PEP show potential for improving patient risk stratification. However, variability in model performance, lack of external validation, and significant bias in many studies limit their clinical applicability. Further external validation, model refinement, and improved bias control are essential for broader clinical implementation.Systematic Review Registrationhttps://www.crd.york.ac.uk/PROSPERO/view/CRD42024626168, identifier CRD42024626168.