While noting the potential of MRI-based AI models for pre-op evaluation of hepatocellular carcinoma (HCC) grading, a new meta-analysis showed declining sensitivity, specificity and AUC for these models in external validation assessments.
For the meta-analysis, recently published in the European Journal of Radiology, researchers compared internal and external validation of MRI-based AI models for prognostic HCC grading based on a review of 18 studies.
In internal validation testing, the meta-analysis authors noted that the MRI-based AI models offered a 78 percent pooled sensitivity, 80 percent specificity and an 85 percent AUC for predicting high-grade HCC.
However, external validation assessment revealed an eight percent decline in sensitivity (70 percent), a six percent decline in specificity (74 percent) and a 10 percent decline in AUC (75 percent), according to the researchers.
“Our findings confirm that while MRI-AI models demonstrate favorable discriminatory ability within internal validation cohorts, their performance declines significantly in external validation, highlighting substantial challenges in cross-center generalization,” noted lead study author Langshan Yang, MD, who is affiliated with the Department of Hepatobiliary Surgery in the General Surgery Center at Zhujiang Hospital and Southern Medical University in Guangzhou, China, and colleagues.
The meta-analysis authors pointed out that deep learning models offered 16 percent higher pooled sensitivity than machine learning models (88 percent vs. 72 percent).
Three Key Takeaways
• Limited generalizability of MRI-AI models. MRI-based AI models for HCC grading show good performance in internal validation (AUC 85 percent), but clinically meaningful drops in sensitivity, specificity, and AUC occur with external validation, underscoring challenges in cross-institutional reliability.
• Deep learning improves sensitivity but not overall robustness
Deep learning models demonstrate higher sensitivity than traditional machine learning (88 percent vs. 72 percent), likely due to better capture of intratumoral heterogeneity, but no significant specificity advantage and limited external validation data may restrict clinical adoption.
• Heterogeneity and study design limit clinical translation. Predominantly retrospective data, variable definitions of high-grade HCC, and inconsistent imaging/segmentation methods contributed to the heterogeneity of the reviewed studies, highlighting the need for standardization and prospective multicenter validation before routine clinical use of the MRI-based AI models.
“(Deep learning architectures) capture more subtle patterns of intratumoral heterogeneity, such as cellular density variations, nuclear atypia, and microvascular infiltration patterns, which are often difficult to quantify using traditional radiomic features,” added Yang and colleagues.
However, the researchers noted there was no significant difference in specificity between deep learning and machine learning models, and that only eight of the reviewed studies assessed deep learning models.
(Editor’s note: For related content, see “Meta-Analysis Examines MRI-Based AI for Predicting Microvascular Invasion in Hepatocellular Carcinoma,” “Study Suggests Merits of PSMA PET/MRI for Detecting HCC in LI-RADS 3 Cases” and “Multicenter Study Affirms Value of Updated AASLD Criteria for Surveillance of Hepatocellular Carcinoma.”)
In regard to limitations with the meta-analysis, the authors acknowledged that all of the reviewed studies were retrospective and conceded variations with the definition of high-grade HCC. The researchers also suggested that a lack of assessment of possible confounding factors, such as segmentation methodologies and parameters in image acquisition, may have contributed to the significant heterogeneity among the included studies.