A new study comparing radiologist assessment and stand-alone artificial intelligence (AI) interpretation of breast ultrasound found that none of the AI-assessed BI-RADS 2 classifications were malignant and there were only two malignancies out of the AI-assessed BI-RADS 3 classifications.
For the retrospective multicenter study, recently published in Academic Radiology, researchers compared multiple radiologist assessments of breast ultrasound images and stand-alone AI assessment (Koios DS for Breast System, Koios Medical) for 530 patients (715 lesions).
The researchers found that the AI system had a 98.51 percent sensitivity rate for detecting breast cancer lesions in comparison to 98.51 percent and 97.76 percent for two radiologists. The AI software also had a negative predictive value (NPV) of 99.48 percent that was similar to reviewing radiologists (99.58 and 99.32 percent), according to the study.
The breast ultrasound image revealed a 7.2 mm circumscribed lesion in a 46-year-old patient with heterogenous echogenicity in the upper inner quadrant of the left breast. It was diagnosed as papilloma with a ductal carcinoma in situ (DCIS) component. The lesion was originally characterized as a BI-RADS 3 lesion by the AI software whereas radiologists categorized the lesion as BI-RADS 4A. (Image courtesy of Academic Radiology.)
The study authors noted that 581 of the 715 lesions were benign and 163 of those resulted in an unnecessary biopsy. The researchers also pointed out that only two of the 124 lesions categorized as BI-RADS 3 by the AI system were malignant. Additionally, out of 238 lesions identified as BI-RADS 3 lesions during initial radiologist review, the AI system identified 110 as BI-RADS 2 lesions, and the study authors found that none of those lesions were malignant.
“Considering that AI BI-RADS 2 is safe, we could prevent 11% (18/163) (of) biopsies of benign lesions and 46.2% (110/238) of unnecessary follow-ups,” wrote study co-author Erkin Aribal, M.D., the head of the Department of Radiology at the Acibadem University School of Medicine in Istanbul, Turkey, and colleagues.
However, the researchers cautioned that the AI modality had significantly lower specificity (65.40 percent) in comparison to two reviewing radiologists (80.72 percent and 75.56 percent). The AI model also had a lower accuracy rate (71.61 percent vs. 84.06 percent and 79.72 percent for radiologists) and positive predictive value (39.64 percent vs. 54.10 percent and 47.99 for reviewing radiologists).
Three Key Takeaways
- High sensitivity and negative predictive value of AI in breast ultrasound. The AI system demonstrated a high sensitivity rate of 98.51 percent in detecting breast cancer lesions, comparable to the sensitivity rates of the two reviewing radiologists. The negative predictive value (NPV) of the AI system was 99.48 percent, indicating its ability to correctly identify benign lesions and potentially reduce unnecessary biopsies.
- Reduction in unnecessary biopsies and follow-ups. The study suggests that AI-assessed BI-RADS 2 classifications were not associated with malignancy, potentially allowing for a reduction in unnecessary biopsies for lesions classified as BI-RADS 2. Additionally, the AI system identified a significant portion of lesions initially categorized as BI-RADS 3 by radiologists as BI-RADS 2, none of which were malignant. This finding implies a potential decrease in unnecessary follow-up procedures.
- Cautious consideration needed for AI-based upgrades. While AI showed promising results in sensitivity and NPV, the study highlights some limitations, such as lower specificity, accuracy, and positive predictive value compared to reviewing radiologists. The significantly higher number of false positives with AI underscores the importance of careful consideration and validation of AI-based upgrades. The authors emphasize the need for evaluating clinical findings, patient history, and risk factors before making decisions based solely on AI results.
While the study authors noted comparable numbers of true positive diagnoses between AI and radiologists, they pointed out that the AI modality had a significantly higher number of false positives (201) in contrast to the reviewing radiologists (112 and 142).
“These findings emphasize the need for cautious consideration of AI-based upgrades, highlighting the importance of reevaluating clinical findings, history, and risk factors before making any upgrades based on AI results,” noted Aribal and colleagues.
(Editor’s note: For related content, see “Multicenter Breast Ultrasound Study: AI Bolsters Accuracy and Specificity of BI-RADS Classifications,” “Automated Breast Ultrasound: Is it a Viable Second-Look Option for Women with Dense Breasts?” and “Study Looks at Contributing Factors to Incomplete Follow-Up for BI-RADS 3 Findings.”)
In regard to study limitations, the authors acknowledged that variable quality in image acquisition with handheld ultrasound can affect the use of AI. While criteria from the American College of Radiology (ACR) usually requires two to three years of stability on ultrasound for benign classification of lesions, the researchers conceded the use of a one-year ultrasound stability standard to determine that certain lesions were benign.