Does AI offer more accurate risk stratification for breast cancer than breast density BI-RADS assessments?
In a new retrospective study, published in JAMA Network Open, researchers compared the deep learning model Mirai to BI-RADS breast density assessments for predicting future breast cancers within a five-year period. The cohort was comprised of 123,091 mammograms from 67,019 women (median age of 58) and 41.4 percent of the reviewed mammograms were from women with dense breasts, according to the study.
The study authors found that the deep learning model provided a 71 percent AUROC for predicting future breast cancer in contrast to 53 percent for breast density.
The researchers noted that adding breast density classification to the deep learning model revealed no significant difference in prognostic accuracy. For women with dense breasts, researchers noted an AUROC of 70 percent in comparison to 72 percent for those with non-dense breasts.
“The results of our study align with recommendations from the U.S. Preventive Services Task Force, which concluded that the evidence is insufficient to recommend supplemental magnetic resonance imaging (MRI) or ultrasonography screening in women with dense breasts and negative mammography results. Our results support this position by showing that breast density alone provides limited risk stratification, whereas DL-based models can more precisely identify patients most likely to benefit from additional screening,” noted lead study author Leslie R. Lamb, MD, MSc, an assistant professor of radiology at Harvard Medical School, and colleagues.
Noting the consistency of the deep learning model across cancer subtypes, the researchers found that Mirai demonstrated a 71 percent AUROC for predicting invasive breast cancer and a 70 percent AUROC for predicting ductal carcinoma in situ in comparison to 53 percent and 56 percent, respectively, for breast density.
• Deep learning outperforms breast density for risk prediction. The Mirai AI model demonstrated substantially higher accuracy for 5-year breast cancer risk (AUROC 71 percent) compared to BI-RADS breast density (53 percent), indicating that density alone is a weak standalone factor in risk stratification.
• No added value from combining breast density with AI. Incorporating BI-RADS breast density into the deep learning model did not improve prognostic performance, suggesting AI-derived risk scores already capture relevant imaging features beyond density.
• Consistent performance across subgroups and cancer types. The AI model maintained superior predictive accuracy across dense and non-dense breasts, racial/ethnic groups, and cancer subtypes (invasive and DCIS), highlighting its potential to improve individualized risk assessment and better guide supplemental screening decisions.
The study authors also noted consistently higher accuracy in risk prediction with the deep learning model in different racial and ethnic groups. For White women, the deep learning model had a 17 percent higher AUROC than breast density (70 percent vs. 53 percent). The researchers noted that Mirai provided 15 percent higher AUROC for Black (72 percent vs. 57 percent) and Asian women (69 percent vs. 54 percent), and 13 percent higher AUROC (69 percent vs. 56 percent) for Hispanic women.
“The model provided more accurate risk discrimination across all breast density categories and demographic subgroups. Importantly, DL risk scores remained predictive in women with non-dense breasts, suggesting that current binary density-based policies may both under identify high-risk women without dense breasts and overidentify low-risk women with dense breasts for supplemental imaging,” added Lamb and colleagues.
(Editor’s note: For related content, see “The Potential of Mammography Image-Based AI Models for Assessing Risk: An Interview with Constance Lehman, MD,” “Could a Deep Learning Model for Mammography Improve Prediction of DCIS and Invasive Breast Cancer?” and “Key Insights on Mammography Research, Breast MRI Studies and Breast Cancer Screening Guidelines.”)
Beyond the inherent limitations of a single-center retrospective study, the authors acknowledged the use of imaging from one vendor system and that over 82 percent of the cohort were White women. The researchers also conceded a small number of false-negative results and that the development of the deep learning model was based on two-dimensional full-field digital mammography.