Emerging research suggests that use of the large language model (LLM) GPT-4o may potentially lead to automated protocoling for abdominal and pelvic computed tomography (CT) scans.
For the retrospective study, recently published in Radiology, researchers assessed prompting-only and fine-tuned GPT-4o (Open AI) models for abdominal and pelvic CT protocoling in comparison to unassisted radiologist protocol selection. The cohort was comprised of 1,448 patients (mean age of 61) who had abdominal or pelvic CT scans, according to the study.
In the internal test set of 548 patients, the prompting-only GPT-4o model selected optimal CT protocols for 527 patients (96.2 percent). The study authors found that unassisted radiologists had optimal protocoling for 484 patients (88.3 percent).
The researchers also noted no statistically significant difference with respect to inappropriate CT protocol selection by the prompting-only GPT-4o model (1.3 percent) and unassisted radiologists (2.4 percent).
“Optimization of GPT-4o with detailed prompting alone, after context engineering, enabled the selection of optimal protocols more frequently than the current standard of care. Our findings demonstrate the ability of LLMs to accurately follow lengthy and complex instructions in a subspecialty domain,” wrote study co-author Rajesh Bhayana, M.D., an assistant professor of radiology and radiologist technology lead in the Joint Department of Imaging at the University of Toronto, and colleagues.
The study authors also assessed the impact of the prompting-only GPT-4o model CT protocol selection in comparison to that for radiologists of different experience levels. The researchers found that the LLM achieved a nearly 12 percent higher rate of matching the protocoling reference standard (91.3 percent vs. 79.4 percent) and a 6 percent higher rate of optimal protocol selection in internal testing in comparison to radiologists (95.4 percent vs. 89.4 percent).
Three Key Takeaways
• Prompting-only GPT-4o exceeded radiologist performance for CT protocoling. In internal testing, the LLM selected optimal abdominal and pelvic CT protocols in 96.2 percent of cases versus 88.3 percent for unassisted radiologists, with no significant difference in inappropriate protocol selection rates.
• Potential to standardize protocoling across experience levels. GPT-4o demonstrated higher concordance with the reference standard than attendings, fellows, and residents, with particularly large gains among trainees, suggesting a role in reducing variability and supporting less-experienced readers.
• Workflow efficiency without added safety tradeoffs. Comparable inappropriate protocol rates and strong adherence to complex instructions indicate that LLM-based automated protocoling could reduce radiologist time spent on non-interpretive tasks while maintaining protocol quality, though real-world integration will need to address variability in requisition data and EMR information.
For radiology fellows, the prompting-only GPT-4o model offered over a 15 percent higher match of the protocol reference standard (90 percent vs. 74.9 percent) and a greater than 7 percent improvement in optimal protocol selection (95.4 percent vs. 87.7 percent). For residents, the study authors noted a nearly 19 percent higher match with the reference standard in use of the prompting-only GPT-4o model (91 percent vs. 72.1 percent) and over an 11 percent higher selection of optimal CT protocols (99.1 percent vs. 87.4 percent).
“ … LLMs could facilitate widespread automated protocoling, which could significantly improve workflow and reduce radiologist time spent on non-interpretive tasks,” maintained Bhayana and colleagues.
(Editor’s note: For related content, see “Nine Takeaways from New Consensus on Abdominal Photon-Counting CT Protocols in Adults,” “Clinical Applications of LLMs in Radiology: Key Takeaways from RSNA 2025” and “Current and Emerging Concepts with LLMs in Radiology: An Interview with Rajesh Bhayana, MD.”)
In regard to study limitations, the authors conceded variability in language and clinical information provided in imaging requisitions may lead to subjective interpretation. The researchers also acknowledged that instances involving direct conversations with clinicians and additional information from electronic medical records were not addressed in the study.