In separate test sets that included challenging mammography cases, researchers found that artificial intelligence (AI) demonstrated similar sensitivity and specificity for detecting breast cancer in comparison to assessments from over 500 clinicians.
Emerging research from the United Kingdom (U.K.) suggests an artificial intelligence (AI) algorithm, utilizing recall thresholds matched to clinician reader performance, offers comparable sensitivity and specificity rates to those of clinician readers in diagnosing breast cancer on mammography exams.
For the retrospective study, recently published in Radiology, researchers compared the performance of an artificial intelligence software (Lunit Insight MMG, version 220.127.116.11, Lunit) versus 552 mammography readers in assessing screening mammography. The two test sets, drawn from the National Health Service Breast Screening Programme (NHSBSP), had a total of 161 normal breasts, 70 malignant breasts and nine benign breasts, according to the study. The researchers said the 552 clinicians included 315 board-certified radiologists, 206 radiographers and 31 breast clinicians.
The study authors noted a 93 percent area under the receiver operating characteristic curve (AUC) for the AI software in comparison to 88 percent for clinician readers. When the researchers employed a recall threshold for AI (> 2.91) that matched the specificity of human readers, they found a 91 percent sensitivity rate and a 77 percent specificity rate for AI in comparison to 90 percent and 76 percent rates, respectively, for clinician detection of breast cancer on screening mammography.
“There was no evidence of a difference between AI sensitivity and the mean sensitivity of human readers from either professional group, with 63 of 70 cancers detected for radiologist readers and 62 of 70 cancers detected for non-radiologist readers,” wrote Jonathan J. James, FRCR, a study co-author who is affiliated with the Nottingham Breast Institute at Nottingham University Hospitals NHS Trust in Nottingham, United Kingdom, and colleagues.
The study authors noted that the mean lesion size was 15.5 mm. Of the detected cancers, 64.3 percent were masses, 12.9 percent were calcifications, 11.4 percent were asymmetries and 11.4 percent involved architectural distortions, according to the researchers.
In an accompanying editorial, Liane Philpotts, M.D., suggested that the AI software utilized in the study may help alleviate the impact of radiologist shortages in performing double mammography reading in European countries. The AI software may also be a supplemental consideration for United States radiologists who don’t see a high volume of mammography, according to Dr. Philpotts, a professor of radiology and biomedical imaging at the Yale School of Medicine.
However, Dr. Philpotts cautioned that women between 50 and 70 years of age, who have mammography screening every three years in the U.K., generally have less dense breasts than premenopausal women, and suggested this may have factored into the study findings.
In regard to study limitations, the study authors conceded that the testing sets were relatively small in size and may not have been representative of broader mammography screening populations. Noting that the imaging readers in the study included non-radiologists and that two-dimensional mammography remains the standard of care in the United Kingdom, the researchers said these limitations also thwart general extrapolation of the research findings.