Can AI do a better job than radiologists of ruling out the need for subsequent biopsy in breast ultrasound interpretation while offering comparable sensitivity for breast cancer in pregnant and lactating women?
For the retrospective study, recently published in European Radiology, researchers evaluated use of an AI ultrasound software (Koios DS, version 3.7.0, Koios Medical) in a review of ultrasound data from 504 women (mean age of 33). Out of a total of 639 breast ultrasound findings, only five findings were malignant, according to the study.
The study authors found that radiologist reading and AI assessment both provided an 80 percent sensitivity rate, but the AI software had an approximately 16 percent higher rate of biopsy recommendations for benign lesions in comparison to radiologist interpretation (37.1 percent vs. 21 percent).
“ … In the full cohort, the AI tool assigned higher BI-RADS categories to a greater proportion of benign lesions, including pregnancy- and lactation-related entities, which could bias less experienced radiologists toward biopsy and represents a potential unintended consequence of (decision support),” noted lead study author Dennis Dwan, a clinical fellow in radiology at the Massachusetts General Hospital in Boston, and colleagues.
Pointing out that the training data for the AI software did not include fluid collections or skin lesions, the researchers performed a sub-analysis of the ultrasound study data with exclusion of galactoceles, fluid collections and skin lesions.
Subsequent findings with the AI software revealed similar overall recommendations for biopsy in comparison to radiologist assessment (27.4 percent vs. 23.8 percent), according to the study authors. The researchers also found that the AI software offered 20 percent greater identification of BI-RADS 2 presentations (53.1 percent vs. 32.9 percent) and a greater than 20 percent decrease in BI-RADS 3 assessments (19.5 percent vs. 43.3 percent).
“Importantly, the AI tool classified a greater proportion of cases as BI-RADS 2 compared to radiologists, which could have meaningful implications for reducing unnecessary follow-up imaging, patient anxiety, and healthcare burden. Although other studies have shown a statistically significant decrease in biopsy recommendations when using the AI tool compared to radiologists, our findings are consistent with the broader trend toward improved specificity for benign lesions,” posited Dwan and colleagues.
Three Key Takeaways
• Matched sensitivity, but higher false-positive biopsy rate in the full cohort. Both AI and radiologists achieved 80 percent sensitivity for breast cancer detection but in the full cohort, the AI software recommended biopsy for benign lesions at a roughly 16 percent higher rate (37.1 percent vs. 21 percent). This suggests the AI may lead to unnecessary procedures, which is particularly concerning in a pregnant/lactating population in whom benign, pregnancy-related findings are common.
• AI performance improves significantly when pregnancy-specific lesions are excluded. After removing galactoceles, fluid collections and skin lesions from analysis, the AI's biopsy recommendation rate aligned much more closely with radiologists (27.4 percent vs. 23.8 percent), and it classified more lesions as BI-RADS 2 (53.1 percent vs. 32.9 percent) while reducing BI-RADS 3 assessments. This points to a meaningful limitation in the AI software’s training data, which did not include these pregnancy- and lactation-related entities.
• Generalizability remains limited for this patient population. Given that galactoceles, fluid collections and skin lesions are common and clinically relevant in pregnant and lactating patients, excluding them from the sub-analysis may inflate the AI software’s apparent performance among these patients. Clinicians should be cautious about over-relying on this AI tool in routine obstetric/lactation breast imaging until its training data is expanded to include these entities.
However, the study authors cautioned that the exclusion of galactoceles, fluid collections and skin lesions may hamper the reliability of the Ai software among pregnant and lactating women.
“ … These excluded entities are common and clinically relevant in pregnant and lactating patients, and their removal limits generalizability and may overestimate the apparent performance of the AI tool in routine clinical practice,” noted Dwan and colleagues.
(Editor’s note: For related content, see “What New Research Reveals About AI Software for Breast Ultrasound,” “Updated 3D Ultrasound Tomography Device Offers Enhanced Software for Breast Imaging Reconstruction” and “Echogenic Rind on Breast Ultrasound Associated with High Specificity for Malignancy.”)
In regard to study limitations, the authors acknowledged the small number of malignancies, retrospective application of AI and concerns over increased biopsy recommendations for AI among benign lesions in the overall cohort.