OR WAIT null SECS
The authors of a new meta-analysis found no significant differences between clinicians and artificial intelligence in diagnosing fractures but conceded that slightly over half of the studies assessed had a high risk of potential bias.
Could artificial intelligence (AI) assessment have comparable diagnostic accuracy to clinician assessment for fracture detection?
In a recently published meta-analysis of 42 studies, the study authors noted 92 percent sensitivity and 91 percent specificity for AI in comparison to 91 percent sensitivity and 92 percent specificity for clinicians based on internal validation test sets. For the external validation test sets, clinicians had 94 percent specificity and sensitivity in comparison to 91 percent specificity and sensitivity for AI, according to the study. In essence, the study authors found no statistically significant differences between AI and clinician diagnosis of fractures.
“The results from this meta-analysis cautiously suggest that AI is noninferior to clinicians in terms of diagnostic performance in fracture detection, showing promise as a useful diagnostic tool,” wrote Dominic Furniss, DM, MA, MBBCh, FRCS(Plast), a professor of plastic and reconstructive surgery in the Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences at the Botnar Research Centre in Oxford, United Kingdom., and colleagues.
The study authors also found that adjunctive use of AI improved clinician detection of fractures to a 97 percent sensitivity and a 92 percent specificity.
Thirty-seven of the reviewed studies utilized radiographs to diagnose fractures and the five remaining studies employed computed tomography (CT). Of the radiographic studies, the researchers noted that 18 studies looked at lower limb fractures, 15 focused on upper extremity fractures and four studies examined other fracture areas, according to the study.
In regard to limitations of the meta-analysis, the authors acknowledged methodologic flaws in many of the assessed studies and a high concern for bias in slightly more than half of the studies. They also restricted the analysis to studies written in the English language that were published after 2018. Dr. Furniss and colleagues noted that nine studies had developed and externally validated AI algorithms.
“External validation of AI classification tools is key because deep learning algorithms may perform well with the data on which they were trained but then show lower performance metrics with a validation set made of completely independent observations,” wrote Jeremie F. Cohen, MD, PhD, a professor in the Department of General Pediatrics and Pediatric Infectious Diseases at Necker-Enfants Malades Hospital at the Universite de Paris in France, and Matthew D.F. McInnes, MD, PhD, FRCPC, a professor of Radiology-Epidemiology at the University of Ottawa, in an accompanying editorial.
Dr. Furniss and colleagues noted the adjunctive potential of AI in streamlining fracture diagnosis and possibly prioritizing areas of interest for radiologists. However, they maintained that AI is not a substitute for clinical workflow and called for future studies to validate AI algorithms in clinical settings.
“External validation and evaluation of algorithms in prospective randomized trials is a necessary next step toward clinical deployment,” emphasized Dr. Furniss and colleagues.