Large Language Models and Clinical Reasoning: What New Research Reveals
In a recent interview, Marc Succi, MD, discussed findings from a new study examining the clinical reasoning capabilities of 21 large language models (LLMs), including GPT-5, Grok 4 and Claude 4.5 Opus.
Are large language models (LLMs) capable of reliable clinical reasoning?
In an attempt to answer this questions, researchers performed a cross-sectional study to assess 21 LLMs (including GPT-5, Gemini 3.0 Flash, Grok 4) for clinical reasoning. For the research, recently published in
The researchers found that all of the reviewed LLM models had higher than 80 percent failure rates for differential diagnosis but less than 40 percent failure rates for final diagnosis.
In a recent interview with Diagnostic Imaging, Marc Succi, MD, a co-author of the study, posited that while LLMs can be effective “when it’s an open book test with all the data,” the models struggle with decision-making when there is uncertain and disorganized data.
“I think it hits at a really important issue in why we did the study the way we did. That differential for us is really the art of medicine and coming up with a proper differential really sets the tone for the rest of the visit. If you have the wrong differential, but still get to the right answer, that also isn't okay, because that means you may have done 20 extra tests to go through the wrong differential and delayed care, extra costs, etc.,” explained Dr. Succi, an associate professor at Harvard Medical School and executive director of the MESH (Medically Engineered Solutions in Healthcare) Incubator at Mass General Brigham.
While noting that LLMs can offer high feasibility and low risk for ambient documentation and radiology worklist triage, Dr. Succi maintained that LLMs currently can’t go beyond possible adjunctive use in clinical workflows.
“… It's really not whether the models can sometimes or most of the time get the answer right. It's whether it reasons reliably in an uncertain environment and with uncertain data. For me, medicine is an environment with a lot of uncertainty and a lot of high stakes. … These LLMs as they're presented, as studied, are not ready for clinical integration in a meaningful way without extensive human involvement or oversight,” emphasized Dr. Succi.
















