Mammography-based artificial intelligence (AI) software offered higher AUCs for breast-level and lesion-level assessments in contrast to unassisted expert readers, according to a new study. However, the researchers emphasized awareness of possible differences between breast- and lesion-level AI evaluations.
For the retrospective study, recently published in European Radiology, researchers compared AI software (Lunit Insight MMG V1.1.7.1, Lunit) to evaluations by 1,258 clinicians who participated in a Personal Performance in Mammographic Screening (PERFORMS) quality assurance program. The total cohort included 882 non-malignant breasts and 318 malignant breasts (328 total cancer lesions), according to the study.
Based off of the AI model’s suspicion of malignancy scores, ranging from 0 to 100, the study authors noted the model thresholds for matching average clinician readers were set at > 10.5 for sensitivity and > 4.5 for specificity whereas the AI developer recommended recall threshold was > 10.
The researchers found that a statistically significant decrease from the AI software’s breast-level AUC (94.2 percent) to the lesion-level AUC (92.9 percent). However, they noted that the AI software outperformed clinician assessments for both (87.8 percent breast level AUC and 85.1 percent lesion-level AUC).
Yet when comparing breast- and lesion-level specificity at the > 4.5 AI matches specificity threshold, the study authors noted a 92.1 percent breast-level sensitivity and a 90.9 percent lesion-level sensitivity. While AI accurately detected and recalled 273 lesions, the software missed recalls on 30 lesions, according to the researchers.
“Our results suggest that AI’s diagnostic performance during mammography is similar or supersedes that of humans, but variation exists in its image and lesion-level classification of malignancies,” noted lead study author Adnan Gan Taib, a research fellow and Ph.D. student affiliated with the School of Medicine at the University of Nottingham in Nottingham, U.K., and colleagues.
Three Key Takeaways
- AI outperforms clinicians in AUC. The AI software demonstrated higher diagnostic accuracy than unassisted clinicians, with breast-level AUC at 94.2 percent and lesion-level AUC at 92.9 percent, outperforming clinicians (87.8 percent and 85.1 percent, respectively).
- Variation between breast- and lesion-level assessments. Although AI performed well overall, discrepancies were observed between breast- and lesion-level analyses, including five cases in which lesion-level AI missed lesions that breast-level AI correctly identified.
- High sensitivity overall but 30 missed recalls in lesion-level analysis. At the > 4.5 AI matches specificity threshold, AI achieved 92.1 percent breast-level sensitivity and 90.9 percent lesion-level sensitivity, correctly recalling 273 lesions but missing 30, highlighting both potential and limitations in clinical use.
The study authors also pointed out discordant scores between AI breast-and lesion-level evaluations in five cases and a total of eight lesions. While AI would have accurately recalled all five cases with the breast-level assessment, the researchers said it failed to localize half of the lesions and would not have recalled five of the eight lesions.
“Lesion-level AI analyses are seldom reported in the literature, but they could have implications on the human-AI relationship during assisted mammography reading, particularly in cases where there is discordance. An AI tool that can report at the lesion level accurately provides positive insight into its “thought” process, which is particularly important as we move towards the prospective implementation of AI … , added Taib and colleagues.
(Editor’s note: For related content, see “New Study Examines Key Factors with False Negatives on AI Mammography Analysis,” “Emerging AI Mammography Model May Enhance Clarity for Initial BI-RADS 3 and 4 Classifications” and “Mammography AI Platform for Five-Year Breast Cancer Risk Prediction Gets FDA De Novo Authorization.”)
In regard to study limitations, the authors acknowledged the retrospective nature of the research, test sets enriched with cancer cases and lack of prior image assessment for AI and radiologist evaluations.