In a review of 555 neuroimaging-based artificial intelligence (AI) models for the diagnosis of psychiatric disorders, researchers found that nearly 72 percent of the AI models had an inadequate sample size and over 99 percent were insufficient at handling data complexity.
Noting a bevy of concerns over small sample sizes, poor reporting completeness and insufficient handling of data complexity, the authors of a new meta-analysis claimed a significant majority of neuroimaging-based artificial intelligence (AI) models for psychiatric disorder diagnosis are inadequate for clinical application.
For the meta-analysis, recently published in JAMA Network Open, the authors reviewed 517 studies that presented a total of 555 neuroimaging-based AI models for use in detecting psychiatric disorders. For systematic assessment of the AI models for risk of bias and reporting quality, the researchers utilized the Prediction Model Risk of Bias Assessment Tool (PROBAST) and the modified Checklist for Evaluation of Image-Based Artificial Intelligence Reports (CLEAR) systems, according to the study.
Out of 555 AI models, the researchers found that 83.1 percent (461 models) had a high risk of bias (ROB). The meta-analysis authors also noted inadequate sample sizes in 71.7 percent (398 models) and insufficient handling of data complexity in 99.1 percent (550 models) of the AL models.
While sample sizes of more than 200 participants have become acceptable for AI models, the study authors noted this was not the case for the majority of the models reviewed in the meta-analysis.
“ … About 80% of AI models for psychiatric diagnosis were trained on far smaller samples, leading to a high (ROB) and resulting in poor generalizability,” wrote study co-author Hu Chuan-Peng, Ph.D., who is affiliated with the School of Psychology at Nanjing Normal University in Nanjing, China, and colleagues. “On the other hand, model configurations, including data leakage, performance optimization, and absence of handling data complexities, represented key challenges to increasing model ROB … On balance, a high ROB in the analysis domain should be addressed to prompt clinical applications in future AI models.”
The meta-analysis authors also noted that 38.8 percent of the AI models were plagued by incomplete reporting and 60.1 percent had incomplete technical assessment.
“Going forward, complete and transparent reporting quality is imperative for applicable AI diagnostic models in clinical practice,” emphasized Chuan-Peng and colleagues.
In regard to study limitations, the authors noted they did not test the clinical performance of the AI models. They also acknowledged a rating heterogeneity with the benchmarks employed in the meta-analysis and suggested that the lack of a suitable benchmark may have led to a possible overestimation of ROB. Chuan-Peng and colleagues posited that the development of a psychiatry-specific, image-based AI model may bolster the reporting quality associated with these models.
Considering Breast- and Lesion-Level Assessments with Mammography AI: What New Research Reveals
June 27th 2025While there was a decline of AUC for mammography AI software from breast-level assessments to lesion-level evaluation, the authors of a new study, involving 1,200 women, found that AI offered over a seven percent higher AUC for lesion-level interpretation in comparison to unassisted expert readers.
Can CT-Based Deep Learning Bolster Prognostic Assessments of Ground-Glass Nodules?
June 19th 2025Emerging research shows that a multiple time-series deep learning model assessment of CT images provides 20 percent higher sensitivity than a delta radiomic model and 56 percent higher sensitivity than a clinical model for prognostic evaluation of ground-glass nodules.
FDA Clears Ultrasound AI Detection for Pleural Effusion and Consolidation
June 18th 2025The 14th FDA-cleared AI software embedded in the Exo Iris ultrasound device reportedly enables automated detection of key pulmonary findings that may facilitate detection of pneumonia and tuberculosis in seconds.