Meta-analysis finds predictive models for post-ERCP pancreatitis show moderate accuracy

Frontiers in Medicine Published April 1, 2026 DOI ↗

Key Takeaway

Consider predictive models for PEP as investigational tools requiring validation before clinical use.

This systematic review and meta-analysis evaluated the accuracy, reliability, and risk of bias of predictive models for post-ERCP pancreatitis (PEP). The analysis included 23 studies (21 for model development, 2 for external validation) involving patients undergoing ERCP. The primary outcome was model performance, assessed through metrics including area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and calibration.

The pooled AUC for the two externally validated models was 0.79 (95% CI: 0.75–0.83). Machine learning models demonstrated a higher mean AUC of 0.84 compared to traditional logistic regression models, which had a mean AUC of 0.76. Common predictive factors identified across models included difficult cannulation, female sex, pancreatic duct dilation, and history of pancreatitis. The mean events per variable (EPV) across studies was 10.2, with a wide range from 2.2 to 22.4.

Safety and tolerability data were not reported. Key limitations include significant variability in model performance, lack of external validation for most models, and significant bias identified in many of the included studies. The practice relevance is that while predictive models show potential for improving patient risk stratification, their current clinical applicability is limited. These models require further external validation and refinement before they can be reliably implemented in clinical practice.

View Original Abstract ↓

Background and aimsPost-ERCP pancreatitis (PEP) is the most common complication following ERCP, leading to significant clinical and economic consequences. Predictive models for PEP can help identify high-risk patients and guide preventive strategies. However, the performance of these models varies, and a comprehensive evaluation is lacking. This study aims to assess the accuracy, reliability, and risk of bias in existing predictive models for PEP.MethodsA comprehensive search was conducted across five databases (PubMed, Embase, Web of Science, Cochrane Library, and CNKI) for studies published until January 2025. Studies that developed or validated predictive models for PEP were included. Models with external validation sets were included in a meta-analysis. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and calibration. A random-effects meta-analysis was performed, with heterogeneity assessed using I² statistics. Data extraction and risk of bias were conducted using a standardized template combining the CHARMS and PROBAST tools.ResultsTwenty-three studies (21 model development studies and 2 external validation studies) were included, presenting 21 predictive models for PEP. Nine models incorporated external validation, with one study recalibrating an existing model and another externally validating two prior models. The mean events per variable (EPV) across studies was 10.2 (2.2 to 22.4). The pooled AUC for externally validated models was 0.79 (95% CI: 0.75–0.83). Machine learning models demonstrated higher AUC (0.84) than traditional logistic regression models (0.76). Common predictive factors included difficult cannulation, female sex, pancreatic duct dilation, and a history of pancreatitis.ConclusionsPredictive models for PEP show potential for improving patient risk stratification. However, variability in model performance, lack of external validation, and significant bias in many studies limit their clinical applicability. Further external validation, model refinement, and improved bias control are essential for broader clinical implementation.Systematic Review Registrationhttps://www.crd.york.ac.uk/PROSPERO/view/CRD42024626168, identifier CRD42024626168.