Artificial Intelligence vs. Radiology Residents: Who Reads Chest X-rays Better?

October 9, 2020

A well-trained algorithm can read chest X-rays as well as third-year residents, opening the door for more streamlined workflow and cost savings.

Artificial intelligence (AI) algorithms can read chest X-rays as well as radiology residents, potentially fast-tracking interpretations, saving resources, and freeing up residents for other duties.

Chest X-rays are the most common diagnostic study used in emergency departments, so finding a way to streamline the workflow around them can be critical, said a team from IBM Almaden Research. In a study published Oct. 9 in JAMA Network Open, the team led by Joy T. Wu, MBChB, MPH, postdoctoral researcher, revealed that their well trained AI algorithm performed at or above the level of third-year radiology residents when it comes to identifying various characteristics on radiographs.

“This study points to the potential use of AI systems in future radiology workflows for preliminary interpretations that target the most prevalent findings, leaving the final reads performed by the attending physician to still catch any potential misses from the less-prevalent fine-grained findings,” the team said.

Related Content: AI Algorithm Reaches Equivalent Accuracy of Average Radiologist

Making it possible for attending radiologists to quickly correct automatically produced reads could lead to expedited dictation-driven radiology workflows, improved accuracy, and reduced costs of care, they explained.

To assess how their algorithm would work in a real-world setting, the team collected data and evaluated both algorithm and resident performances between February 2018 and July 2020. The team compared how well the algorithm read anteroposterior (AP) frontal chest X-rays to the performance of five, third-year residents from academic medical centers around the country.

A training data set of 342,126 front chest X-rays from emergency rooms and urgent care centers was used to train the algorithm. The team also used a study data set of 1,998 AP images that were assembled through a triple consensus with adjudication ground truth process that covered more than 72 chest X-ray finding labels. To judge comparative performances, the team focused on nine findings: airway tubes, pleural effusion, opacities, pulmonary edema, cardiomegaly, atelectasis, central vascular lines, consolidation, and no anomalies.

According to the team, the deep learning model was trained on all 72 finding labels. Each resident read approximately 400 non-overlapping set of images, and they were all unaware of the AI algorithm estimates.

Based on their analysis, the team determined mean image-based sensitivity for the algorithm was 0.716, and it was 0.720 for the residents. In addition, positive predictive value for the algorithm was 0.730, and it was 0.682 for the residents. Specificity for the algorithm and the residents were 0.980 and 0.973, respectively.

Preliminary Read Performance Differences Between Radiology Residents and AI Algorithm

MethodNo. ImagesNo. FindingsPPVSensitivitySpecificity
Residents1,998720.682 (0.670-0.694)0.720 (0.709-0.732)0.973 (0.971-0.974)
Algorithm1,998720.730 (0.718-0.742)0.716 (0.704-0.729)0.980 (0.979-0.981)
AI vs Residents, P valueN/AN/A.001.66<.001

When examining performances based on individual findings, the team found that residents’ operating points were on or very near the receiver operating curve (ROC) for four findings – no anomalies, opacities, pleural effusion, and airway tubes. They were also below the ROC for two findings (pulmonary edema and cardiomegaly), and they were above it for three findings (atelectasis, central vascular lines, and consolidation.)

AI performance was similar for tubes, lines, and non-anomalous reads, but the algorithm out-performed residents with the more high-prevalence labels, such as cardiomegaly, pulmonary edema, subcutaneous air, and hyperaeration. It did not perform as well in interpreting masses or nodules and enlarged hilum.

The outcome of the study is relevant to AI and radiology communities in a variety of ways, the team said. Not only did the team show it is possible to build an algorithm that can perform comparatively to radiology residents, but the large cadre of included clinicians also highlighted the integral role radiologists and clinical experts can play in developing algorithms. Additionally, including so many findings likely meant confounding factors that are typically hidden were more fully covered than they had been in previous efforts, and this study shows it is possible to build a single neural network that can capture a wide variety of fine-grained findings and optimize their prediction.

Ultimately, the team said, the results indicate that the algorithm could make an impactful change in radiology workflow.

“These findings suggest that well-trained AI algorithms can reach performance levels similar to radiology residents in covering the breadth of findings in AP frontal chest radiographs,” they said, “which suggests there is the potential for the use of AI algorithms for preliminary interpretations of chest radiographs in radiology workflows to expedite radiology reads, address resource scarcity, improve overall accuracy, and reduce the cost of care.”

For more coverage based on industry expert insights and research, subscribe to the Diagnostic Imaging e-newsletter here.