News|Articles|May 20, 2026

Comparative Radiology Study Reveals ‘Significant Variability’ with AI CXR Software for Lung Cancer Detection

Author(s)Jeff Hall

A retrospective comparison of AI platforms from seven different manufacturers for stand-alone lung cancer detection revealed sensitivity rates ranging between 20.8 percent to 77.8 percent.

There are significant differences in the effectiveness of commercial AI software platforms for detecting lung cancer on chest radiographs, according to a new head-to-head comparison study involving software from seven different manufacturers.

For the retrospective study, recently reported in Radiology, researchers compared the stand-alone performance of multiple AI software modalities for detecting lung cancer on chest X-ray in a cohort of 5,235 patients (median age of 60). The study authors noted that 1.4 percent of the cohort had diagnosed lung cancer with a visible tumor.

The reviewed AI devices included: Annalise Enterprise CXR version 3.8 (Harrison.ai); ChestView version 1.5.X (Gleamer); InferRead DR Chest version 1.0.0.1 (InferVision); TechCare Chest version 2.1 (Milvue); ChestEye version 2.6 (Oxipit); qXR version 4.1 (Qure.ai); and Rayscape CXR (multiple versions) (Rayscape). While all the reviewed AI models have received the CE mark, only one (qXR) has been cleared by the Food and Drug Administration (FDA), according to the study.

The researchers found that sensitivity rates for the reviewed devices ranged from 20.8 percent to 77.8 percent. Specificity rates ranged between 58.9 percent to 98.4 percent and positive predictive value (PPV) ranged between 1.5 percent to 28.4 percent, according to the study authors.

“The devices demonstrated variability in diagnostic accuracy with poor agreement of classification outputs between devices and with radiologist reporting. Similar patterns were observed across all 5,235 included patients and in the subset of 72 patients with visible confirmed cancer, suggesting that the observed variation was due to inherent differences in model performance rather than prevalence effects,” noted lead study author Ahmed Maiter, MB BChir, MA, FACR, who is affiliated with the Department of Radiology at Sheffield Teaching Hospitals in Sheffield, U.K., and colleagues.

In comparison to radiologist reports, the study authors pointed out that all the reviewed devices had increased false positive results that ranged between 10 to 2,039 for tumor detection.

“FP results translate to unnecessary additional investigations being performed, even with oversight by experienced radiologists, who may be reluctant to overrule AI device results because of automation bias and concerns about accountability,” said Maiter and colleagues. “ … If these devices were deployed to triage patients with positive results straight to CT, they would result in markedly different numbers of additional CT examinations performed, with associated differences in financial cost, environmental impact, burden for reporting worklists, and impact on radiology service delivery.”

Three Key Takeaways

  • Significant performance variability exists across AI platforms. Sensitivity ranged from 20.8 percent to 77.8 percent and specificity varied from 58.9 percent to 98.4 percent across the seven reviewed devices. Clinicians should not assume CE-marked or commercially available AI tools are interchangeable for lung cancer detection on chest X-ray.
  • False positive burden is a real clinical concern. Compared to radiologist reporting, all devices generated additional false positives — ranging from 10 to over 2,000 — which can drive unnecessary downstream CT imaging, increased costs, and strain on radiology workflows. Automation bias may also make radiologists reluctant to override AI findings, amplifying this risk.
  • Intended clinical use should drive AI platform selection. The optimal tool depends on the deployment goal: a high-specificity, high-PPV device is preferable for triaging patients directly to CT, while a high-sensitivity, high-NPV device better suits worklist prioritization or augmenting radiologist reads. There is no one-size-fits-all solution, and institutions should align platform choice with their specific workflow objectives.

Calling for more comparative studies to evaluate AI CXR software modalities for lung cancer detection, the researchers maintained that intended use of the AI platforms for navigating burgeoning worklists is key to optimal selection and eventual deployment.

“If the aim is to triage those patients with radiographic appearances suspicious for lung cancer straight to CT, then a device with higher specificity and PPV is likely to be desirable. Conversely, if it is used to prioritize reporting worklists or improve radiologist accuracy, then a device with higher sensitivity and NPV may be preferable,” added Maiter and colleagues.

(Editor’s note: For related content, see “Eleven Takeaways from New Analysis of CT-Based AI for Lung Cancer Screening,” “Can Density Homogeneity on Chest CT Improve Differentiation of Sub-Centimeter Nodules?” and “Do Significant Incidental Findings on Low-Dose CT Lead to Elevated Risks for Extrapulmonary Cancer?”)

Beyond the inherent limitations of a single-center retrospective study, the authors acknowledged the use of one radiography system to obtain the reviewed images in the study. The researchers conceded the lack of quantitative lesion-level analysis and noted considerable variability with the output classes provided by manufacturers. While the study focused on stand-alone performance of the AI software modalities, the authors noted that these devices may be utilized differently in clinical practice.


Latest CME