Newest generation of speech recognition faces fresh tasks

September 1, 2006

Role of second-generation systems expands from functioning merely as an element in workflow to overseeing the process, coordinating PACS and RIS

Speech recognition systems have been around longer than PACS, but the technology has remained mostly static for the past decade, having only a marginal impact on workflow. Most large radiology departments use SR in some form, but first-generation systems are often simply grafted onto existing work processes.

That changes in SR's next generation, currently incubating at Massachusetts General Hospital. Second-generation SR manages overall workflow for the entire radiology department, a feature born more out of necessity than ambition. Current PACS and RIS solutions are too provincial.

"Any other workflow agent, whether it's a RIS or a PACS, can manage only its own workflow," said Keith J. Dreyer, D.O., Ph.D., an assistant professor of radiology at Harvard Medical School and vice chair of radiology informatics at MGH. "The only way for a radiologist to know for sure such details as which cases have been read is to go to the reporting system, because only the reporting system knows whether someone is dictating, not dictating, or finished dictating."

Since SR was the only oracle of case truth in this situation, and since it also has the ability to connect to every other PACS or RIS within reach, MGH decided to reinvent SR to act as central workflow manager. Recasting SR as workflow manager solves several problems.

One is inaccurate dictation status. Typically, PACS knows who is dictating a case because the user presses a toggle, but this method is fraught with user error. If a user dictates without toggling, PACS will offer this case to be dictated to other radiologists, potentially resulting in a duplicate read. Worse, if a user toggles the dictation switch then leaves without dictating, PACS will mark the case as read, and it will not appear on other work lists.

"These two situations happen over 100 times a day at MGH, and they need to be manually corrected by clerical staff," Dreyer said.

Another problem relates to multiple visualization systems. If a site has multiple ways to view images (conventional PACS or multiple 3D visualization, thin-client, cardiac visualization, PET/CT, nuclear medicine, or remote hospital interpretation systems), there is no way for a single RIS or PACS to manage the workflow.

Using SR, the orchestrator can oversee workflow and work lists of all of these systems, Dreyer said.

A third issue addressed by the SR workflow orchestrator deals with multiple RIS input. When a single group of radiologists is reading for more than one hospital or imaging center or receiving cases through teleradiology, no single RIS or PACS can aid the radiologists in a common view of their workflow to find out something as fundamental as which cases need dictation. With an external SR workflow manager, radiologists can be given a common view to all of these disparate systems.

FIGURES OF SPEECH

The resulting next-generation SR solution developed at MGH has been licensed to startup Commissure of New York City. Speech recognition itself was born out of necessity, with many features emerging in the MGH test bed. The Boston hospital has used SR longer than it has used PACS.

"I started tracking speech recognition 20 years ago because I realized that we struggled every day with the classic dictation, transcription, review, and signoff paradigm," said Dr. James Thrall, radiologist-in-chief at MGH, at the 2006 European Congress of Radiology.

Of the 650,000 exams performed at MGH each year, 95% of the reports are now dictated via speech recognition. The advantages are time and money, according to Thrall. SR eliminates transcripts and transcription delay. It also eliminates transcriptionists. MGH once had 25 full-time transcriptionists on its staff; it now has three. There are no lost reports and no proxy signatures required because radiologists can finalize reports contemporaneously with dictation.

"There is also greater accuracy in the final product," Thrall said.

Accuracy is a lingering issue in radiology reports, as Thrall discovered while researching one of his books.

"I needed an illustration of a case of hyperparathyroidism, so I went into our system and discovered in the old days of transcription that in just one year we had spelled the word 'hyperparathyroidism' 26 different ways, one of which was correct," he said.

In order to tame vocabulary inconsistency in radiologists, Dreyer devised a natural language processing (NLP) application called Leximer (Lexicon Mediated Entropy Reduction), an advanced software engine used to automatically classify unstructured radiology reports, resulting in unparalleled speech recognition accuracy (Radiology 2005;234(2):323-329. Epub 2004 Dec 10). Leximer does more than increase report accuracy, however. The intent is to help SR not just understand what the radiologist is saying and spell the words correctly, but also to form the meaning, Dreyer said.

Leximer does this in two ways: First, it adds value to the radiologist at the time of interpretation in an automated, real-time manner just by listening to the words spoken during a dictation.

"So if a radiologist is talking about a finding in a certain anatomical area, but pathology has not been mentioned, Leximer can bring up a differential diagnosis of possible things that that finding could be," Dreyer said.

Second, the Leximer NLP is used to create features called smart templates that conveniently appear automatically in appropriate places in the report during dictation. Suppose a radiologist is dictating a chest CT report, talking about heart, lungs, ribs, and upper abdomen, Dreyer said. If he or she starts to talk about the liver, for example, the system automatically places that phrase in the upper abdomen section of the report. The system can do this instantaneously because the NLP understands, in real-time, the clinical meaning of the words.

"This way, radiologists don't have to divert their attention from the images," Dreyer said. "They can just speak, and each phrase they utter finds its proper location in the smart template, through the use of Leximer NLP."

Unlike any other method of structured reporting, Leximer lets radiologists focus completely on the images while they're dictating, allowing them to speak using their individual style so they don't have to worry about following a certain cadence or, even worse, mouse clicks and menus.

"You simply speak to the findings as you see them," Dreyer said.

This inherent intelligence can drastically reduce dictation time, while providing a common and consistent reporting style regardless of the number of radiologists in the group.

"We have over 200 radiologists, each with his or her own style of reporting. But by using the same smart templates, the reports are similar in format, a feature our referring physicians appreciate," Dreyer said.

This avoids such report irregularities as findings found inside impressions, conclusions inside paragraphs with no impressions, and occasional verbosity.

"It can be difficult for someone to read through five CTs of the same patient dictated by different radiologists with differing reporting styles," Dreyer said.

The Leximer feature, also licensed to Commissure, has been running in production at MGH since early this year.

RED FLAG

MGH's advanced SR is also able to assign codes to reports that indicate finding severity.

"If there are findings of interest-a worrisome nodule or a follow-up recommendation-we are able to capture that in a structured format and flag it for the referring physician or for subsequent data mining, a feature we are using heavily for validation of our radiology decision support rules and for identification of phenotypic expression of genetic predispositions," Dreyer said.

This feature also helps radiologists when they dictate subsequent cases for the same patient. They can easily sift through the historical information using NLP to see which exams had significant findings.

"The system extracts the positive phrases, which gives you an instantaneous synopsis of the patient's historical radiologic condition," Dreyer said.

This latter aspect is in the process of deployment. MGH is just now completing work under a grant that allows the same NLP to mine the hospital's electronic medical record, Dreyer said. With this enhancement, radiologists dictating a study ordered, say, to rule out infection, can use the NLP to mine the EMR for information about that patient pertaining to the exam being performed without having to dig for it themselves.

"In this case, the system would mine for white blood cell count, prior history of infection, or previous admissions discharge summary information, allowing for more exact diagnoses of a suspicious finding," Dreyer said.

Recent advances in speech software for general personal computing could move SR further into the mainstream. The ability to speak into a microphone and see words appear on the screen or to control an application via voice command will be a feature of all PCs soon, making it a productive and reasonable alternative to keyboard and mouse.

"With the availability of new core speech engines, we are about to see breakthroughs that will make speech technology a commodity on the PC," said Commissure CEO Michael Mardini.

The challenge then will shift from how to deliver an SR solution with good accuracy to how to build an application that improves the radiology reporting process, while helping radiology professionals deliver a better product to patients and referring physicians, Mardini said. He noted there is a big difference between SR technology and the application that houses it.

"The core speech technology that converts spoken words to text in a text editor on a PC is simply a keyboard replacement," he said.

Building and implementing an application that takes advantage of real-time text to provide true workflow and service enhancements to the radiology domain is the real challenge and ultimate value to be considered, he said.

Perhaps the most compelling feature in next-generation SR is its ability to fill a hole in today's PACS, according to Mardini.

"The problem with PACS is that the communications component does not exist for the radiologist," Mardini said. "The ability of a radiologist to communicate in an efficient manner with all parties-technologists, radiologists, referring clinicians-is a big missing piece in today's environment."

Some disadvantages of first-generation SR are not cured by the next generation. SR runs on computers, and computers are completely literal.

"If misspelling is the error in human transcription, the wrong word is the error in SR," Thrall said. "I may say 'bad nerves,' and the computer may write 'bad nurse.' In SR, there is still a requirement that radiologists review the reports they send out."

MGH is in the process of enhancing its NLP to recognize in real-time, then automatically correct, these pesky semantic errors.

Radiologic culture shock is also involved in SR migration, no matter which generation of the technology is installed. Radiologists will be the first to notice that the transcription function has been transferred from a secretary to the radiologists themselves.

"In the early experience, the radiologist may take a little more time to put each report together," Thrall said.

Also, the technology does not immediately benefit the end-user radiologist.

"It benefits everyone else-the patient, the referring physician, and the institution-but not necessarily the radiologist," Thrall said.

Report generation initially takes longer, so cost savings in this sense are achieved at the expense of the radiologist. Complexity is another issue. SR is another complicated system that must be integrated with everything else. And ergonomic challenges persist, particularly in the use of the microphone.

KEYS TO IMPLEMENTATION

SR's role in the future of radiology is clearly growing. Successful implementation, then, depends on several factors, including adequate project planning, the right architecture, user support, and training.

"Project planning is no different for SR than for any other project, except senior leadership is especially important," Thrall said.

System specifications are other details that should be given serious attention. Thrall cited the need for speed.

"The faster the better," he said. "Radiologists will be frustrated if they have to wait for the system to catch up to them."

Thrall recommends a minimum network speed of 100 Mbps, though he admits that's on the low side.

System user support protocol also changes under SR. Prior to SR, transcriptionists might be onsite for only 12 to 15 hours every day. When they went home, cases stacked up. In the era of SR, a plan for operating and supporting the system 24/7 is necessary, Thrall said. Every department should also have a backup system. MGH uses simple dictating equipment as backup, because the radiologists still need to get their work done even if the system is unavailable.

"SR is certain to fail at some point," Thrall said.

He considers the most important element of implementation to be training. SR implementation success is related directly to the intensity of training, which Thrall likens to learning to play a musical instrument.

"No one understands how important training really is," he said.

MGH has put together a competency-based program for every person who uses SR that covers everything from system architecture and log-on to enrollments, mouse, and microphone functions. User and trainer must both sign off on all competencies.

"Anything less, and the user is likely to be frustrated," Thrall said.

Trainers then make office calls, watching radiologists actually use the system.

"Refresher training is indicated if we see any hesitation because they don't know all the functions," Thrall said.

New users face many challenges. The command sets are extensive. Users must dictate punctuation. Dictation patterns are new.

"The product is shorter reports with simple declarative statements, not the long-winded multiple dependent clauses that we often find in conventional radiology reports," Thrall said.

Thrall claimed that 40% of his radiologists are now power users who would "fight to the death to keep SR."

"They have mastered all functions, including all nine microphone buttons," he said.

Mr. Page is a contributing editor of Diagnostic Imaging.