In a study of over 1,500 patients, researchers found that an emerging artificial intelligence (AI) modality had significantly higher sensitivity rates for abnormal posteroanterior chest radiographs and critical finding radiographs than radiology reports.
Could autonomous artificial intelligence (AI) have an emerging role in assessing chest X-rays?
In a new multicenter retrospective study involving a total of 1,529 patients (mean age of 69), researchers compared autonomous AI (ChestLink version 2.6, Oxipit) versus radiologist reporting on 1,100 abnormal posteroanterior chest radiographs, 617 critical abnormal radiographs and 429 normal radiographs.
The study authors reported a 99.1 sensitivity rate for AI on abnormal radiographs in comparison to a 72.3 percent sensitivity for radiologist reports. Autonomous AI also yielded a 6.3 percent higher sensitivity than reporting radiologists for critical abnormal X-rays (99.8 percent vs. 93.5 percent), according to the recently published study in Radiology.
“In this consecutive multicenter study, a commercial AI tool intended for autonomous reporting of normal chest radiographs demonstrated a sensitivity of 99.1 percent (1090 of 1100) for abnormal findings, which was significantly higher than that with the clinical radiologic reports, and with only one false-negative “critical” chest radiograph – a subtle pneumonic opacity,” wrote study co-author Michael B. Andersen, M.D., Ph.D., who is affiliated with the Department of Radiology at the Herlev and Gentofte Hospital in Copenhagen, Denmark, and colleagues.
The researchers noted the potential of the AI modality to help alleviate radiologist workload. Andersen and colleagues found that 120 of the 1,529 posteroanterior chest X-rays could have been correctly identified as normal with autonomous AI, accounting for 7.8 percent of the reviewed radiographs. They pointed to an even higher percentage of normal autonomous AI findings in the outpatient group (11.6 percent) in comparison to outpatients and those who had X-rays in the emergency department (6.2 percent).
“These results suggest that an outpatient setting with a high prevalence of normal chest radiographs is a particularly good setting for these AI models,” posited Andersen and colleagues. “Our results showed that there was no statistically significant difference in the automation rate of normal posteroanterior chest radiographs across the four hospitals even though the percentages of outpatients were different.”
(Editor’s note: For related content, see “Can Deep Learning Assessment of X-Rays Improve Triage of Patients with Acute Chest Pain?” and “Deep Learning Model Predicts 10-Year Cardiovascular Disease Risk from Chest X-Rays.”)
The study authors noted overall sensitivity rates for autonomous AI at the four hospitals included in the study were above 96 percent and added that the AI tool had 100 percent sensitivity for critical and other remarkable findings among outpatients.
However, Andersen and colleagues also pointed out that radiology reporting had a significantly higher specificity rate for abnormal X-rays (91.8 percent) in comparison to the AL tool (28 percent). Additionally, 190 of the 299 false-negative findings (64 percent) in the radiology reports were noted as “abnormal, unremarkable” findings, according to the study.
“ … It is not meaningful to compare diagnostic accuracy of the AL tool with that of radiologists as radiologists must balance sensitivity and specificity and the deployed AI tool is optimized solely to minimize false-negative findings. Hence, the AI tool and the radiologists are working at entirely different operating points. A radiologist could likely achieve a performance similar to that of the AI tool, but that would not be meaningful in clinical practice,” maintained Andersen and colleagues.
In regard to study limitations, the authors acknowledged the retrospective nature of the study and the lack of actual implementation of the AI modality. Pointing out that unremarkable findings are not routinely reported at the facilities where the study was conducted, the researchers said the sensitivity noted with radiology reports was “probably underestimated.”