The deep learning algorithm can distinguish between malignant and benign nodules at initial screening.
Using a deep learning algorithm with low-dose chest CT can help radiologists accurately estimate a patient’s risk of whether an identified pulmonary nodule is malignant.
Low-dose CT (LDCT) is effective in screening individuals who are at high risk for lung cancer, such as long-time smokers, and the number of people undergoing these scans is increasing. But, correctly distinguishing cancerous nodules from benign ones remains a significant challenge, and accurate assessments are critical because they drive treatment decisions.
In an article published May 18 in Radiology, investigators from The Netherlands shared details about their artificial intelligence (AI) tool – which outperformed sub-specialty trained radiologists – and the role it could potentially play in identifying affected patients as early as possible.
“We successfully developed a deep learning algorithm for malignancy risk estimation of pulmonary nodules detected at low-dose screening CT that was generalizable across screening populations and protocol,” said the team led by first author Kiran Vaidhya Venkadesh, a doctoral candidate with the Diagnostic Image Analysis Group at Radboud University Medical Center. “This deep learning algorithm may aid radiologists in optimizing follow-up recommendations for participants undergoing lung cancer screening and may lead to fewer unnecessary diagnostic interventions.”
It also holds the potential to reduce radiologists’ workload and lower the cost of lung cancer screening.
To judge the algorithm’s performance, the team compared its outcomes to the established and effective Pan-Canadian (PanCan) Early Detection of Lung Cancer model, as well as that of 11 clinicians – four thoracic radiologists, five radiology residents, and two pulmonologists.
In their retrospective study, they used deep learning to develop their algorithm (freely available here) and trained it with CT images of 16,077 nodules, including 1,249 malignancies. The images were collected between 2002 and 2004 from the National Lung Screening Trial. They validated the algorithm using three sets of imaging data from the Danish Lung Cancer Screening Trial: a full cohort of all 883 nodules (65 malignant), as well as two cancer-enriched cohorts with size matching (175 nodules, 59 malignant) and without size matching (177 nodules, 59 malignant).
Related Content: Chest X-Rays with Artificial Intelligence Catches More Lung Cancer
When Venkadesh’s team compared the algorithm’s performance to both the existing assessment models and that of the clinicians, they discovered their algorithm far outperformed them both. Against the PanCan model, the algorithm achieved an area under the curve (AUC) of 0.93 compared to 0.90.
“The algorithm significantly outperformed the PanCan model only in the size-matched cancer-enriched subset,” the team explained. “This suggests that although nodule size remains a strong predictor for malignancy, the algorithm relies more on imaging characteristics for its discriminative power than does the PanCan model.”
Related Content: Add Predicting Heart Disease Mortality Risk to LDCT Capabilities
It also outperformed the thoracic radiologists in cancer-enriched cohorts with both random benign nodules (AUC 0.96 versus 0.90) and size-matched benign nodules (AUC 0.86 versus 0.82).
Based on these results, the team said, the algorithm could offer several benefits to the clinical environment. Radiologists can upgrade suspicious nodules to the Lung-RADS 4X category, but the algorithm does not require manual interpretation of nodule imaging characteristics. This could potentially lead to a reduction in the substantial interobserver variability in CT interpretation, said senior author Colin Jacobs, Ph.D., assistant professor of medical imaging at Radboud.
Ultimately, the team said, they see this algorithm being used as a support tool to radiologists’ efforts.
“We foresee a demand for trained human observers, aided by reliable artificial intelligence systems, that act as first readers of chest CT when lung cancer screening programs are introduced worldwide,” the team said. “This deep learning algorithm may aid radiologists in optimizing follow-up recommendations for participants undergoing lung cancer screening and may lead to fewer unnecessary diagnostic interventions.”
In an accompanying editorial, PanCan developer Martin C. Tammemägi, DVM, MSc, Ph.D., reiterated the need for an algorithm that can distinguish between malignant and benign nodules, helping to alleviate the providers workload. He noted that Venkadesh’s team did improve upon past AI prediction models, demonstrating promising results. But, he warned against placing too much emphasis on the AUC achievements.
“I caution readers not to over-interpret the AUC. Often the AUC is interpreted directly as a measure of predictive accuracy,” he explained. “The AUC is not a percentage…the AUC does not measure absolute classification accuracy, but rather it assesses whether the model can place case-noncase pairing in correct rank order.”
This makes algorithm calibration critical, he said. And, given that the algorithm did have instances where it mis-identified a malignant nodule as benign and vice versa – with almost complete certainty – there could be a calibration problem.
“If a clinician’s judgement were influenced by the [deep learning] algorithm’s extreme and incorrect score… it is possible to imagine that harms could be done,” he said.
Still, he said, the relatively high AUCs achieved by the algorithm do indicate that it is picking up valuable predictive information from non-size factors.
According to Venkadesh’s team, though, their work is not complete. They are currently working on another algorithm that uses multiple CT exams at input, potentially expanding its use from initial or baseline screenings to subsequent screenings where it will be useful to compare nodule growth and appearance to previous scans.