Study results show DCNN can use imaging biomarkers to predict sex, potentially confounding the accurate prediction of disease.
A deep convolutional neural network (DCNN) can differentiate between males and females on chest X-ray, according to new research.
Based on existing clinical experience, DCNNs have been shown to be biased against men or women if they are trained on datasets that do not have balanced sex representation. Until now, though, is has been unknown if the algorithms can use visual biomarkers that are beyond a radiologist’s perception to accurately predict sex.
In a poster presented during the Society for Imaging Informatics in Medicine (SIIM) 2021 Virtual Annual Meeting, David Li from the University of Ottawa, detailed the predictive efficacy of a DCNN trained on datasets that included equal numbers of males and females.
For more SIIM 2021 coverage, click here.
“DCNNs trained on two large chest X-ray datasets accurately predicted sex on internal and external test data with similar heatmaps across DCNN architectures and datasets,” he said. “These findings support the notion that DCNNs can leverage imaging biomarkers to predict sex and potentially confound the accurate prediction of disease on chest X-rays and contribute to biased models.”
To test DCNN ability to predict sex, as well as evaluate the visual biomarkers used, Li gathered chest X-ray data from the Stanford CheXPert and National Institutes of Health (NIH) Chest XRay-14 data set, including 224,316 and 112,120 scans from two heterogeneous patient populations, respectively. By using random under-sampling the data volume was reduced to 97,560 images that were balanced to 50 percent male and 50 percent female.
Overall, the dataset was split into 70-percent training, 10-percent validation, and 20-percent test sets. Li used multiple DCNN architectures pre-trained on ImageNet – Inception-V3, ResNet-18, ResNet-50, and VGG-19 – for transfer learning. They were also externally validated.
According to his analysis, on the internal test set, DCNNs trained with both datasets reached an area under the cure ranging from 0.98 to 0.99. External validation showed a peak cross-dataset performance of 0.94 for VGG19-Stanford model and 0.95 for InceptionV3-NIH model.
Additionally, heatmaps showed similar attention areas between model architectures and datasets. They were localized to the mediastinal and upper rib regions and to the lower chest and diaphragmatic regions.
For more coverage based on industry expert insights and research, subscribe to the Diagnostic Imaging e-Newsletter here.