In a review of 555 neuroimaging-based artificial intelligence (AI) models for the diagnosis of psychiatric disorders, researchers found that nearly 72 percent of the AI models had an inadequate sample size and over 99 percent were insufficient at handling data complexity.
Noting a bevy of concerns over small sample sizes, poor reporting completeness and insufficient handling of data complexity, the authors of a new meta-analysis claimed a significant majority of neuroimaging-based artificial intelligence (AI) models for psychiatric disorder diagnosis are inadequate for clinical application.
For the meta-analysis, recently published in JAMA Network Open, the authors reviewed 517 studies that presented a total of 555 neuroimaging-based AI models for use in detecting psychiatric disorders. For systematic assessment of the AI models for risk of bias and reporting quality, the researchers utilized the Prediction Model Risk of Bias Assessment Tool (PROBAST) and the modified Checklist for Evaluation of Image-Based Artificial Intelligence Reports (CLEAR) systems, according to the study.
Out of 555 AI models, the researchers found that 83.1 percent (461 models) had a high risk of bias (ROB). The meta-analysis authors also noted inadequate sample sizes in 71.7 percent (398 models) and insufficient handling of data complexity in 99.1 percent (550 models) of the AL models.
While sample sizes of more than 200 participants have become acceptable for AI models, the study authors noted this was not the case for the majority of the models reviewed in the meta-analysis.
“ … About 80% of AI models for psychiatric diagnosis were trained on far smaller samples, leading to a high (ROB) and resulting in poor generalizability,” wrote study co-author Hu Chuan-Peng, Ph.D., who is affiliated with the School of Psychology at Nanjing Normal University in Nanjing, China, and colleagues. “On the other hand, model configurations, including data leakage, performance optimization, and absence of handling data complexities, represented key challenges to increasing model ROB … On balance, a high ROB in the analysis domain should be addressed to prompt clinical applications in future AI models.”
The meta-analysis authors also noted that 38.8 percent of the AI models were plagued by incomplete reporting and 60.1 percent had incomplete technical assessment.
“Going forward, complete and transparent reporting quality is imperative for applicable AI diagnostic models in clinical practice,” emphasized Chuan-Peng and colleagues.
In regard to study limitations, the authors noted they did not test the clinical performance of the AI models. They also acknowledged a rating heterogeneity with the benchmarks employed in the meta-analysis and suggested that the lack of a suitable benchmark may have led to a possible overestimation of ROB. Chuan-Peng and colleagues posited that the development of a psychiatry-specific, image-based AI model may bolster the reporting quality associated with these models.