Workflow picks up benefits from speech recognition tools

September 1, 2005

Many radiology departments in hospitals and private imaging centers are going digital with the installation of a radiology information system (RIS) and/or picture archiving and communications system (PACS). Departments seem to place less emphasis, however, on automated speech recognition as part of the digital radiology enterprise.

Many radiology departments in hospitals and private imaging centers are going digital with the installation of a radiology information system (RIS) and/or picture archiving and communications system (PACS). Departments seem to place less emphasis, however, on automated speech recognition as part of the digital radiology enterprise.

Speech recognition should be regarded as an important component of a RIS/PACS. It can improve radiologists' productivity and reduce operating costs in the imaging department, while promising greater safety through increased reporting accuracy.

Total report turnaround time, defined as the time between completion of an imaging study and the radiologist's sign-off of the final report, is usually measured in hours if not days. The process takes even longer in teaching hospitals, where the addition of a resident or fellow to the workflow substantially increases turnaround time.

Reporting is often done in batches, generally the most efficient system when cassette tapes are used. Checking and approval of reports also occurs in batches. Original studies are pulled out again at this stage to ensure that the report matches the images.

Workflow in the radiology department at Tan Tock Seng Hospital was almost entirely manual before the installation of a RIS. We dictated studies into microcassette recorders. Secretaries transcribed these verbal reports into a rudimentary text-entry database. Reports for proofreading were printed on scraps of recycled paper so radiologists could ink in corrections. Because this process frequently occurred more than a day after the study was dictated, radiologists had to retrieve study images to confirm their findings or rely on notes.

Reports were printed on dot-matrix printers because a single "no-carbon required" paper form was used for both the request and report. One copy was dispatched to the referring physician, and the other was stored in the radiology department as a backup.

Introduction of an integrated RIS/PACS together with speech recognition software led to a number of workflow changes. Because dictation with automatic speech-to-text translation occurs at the reporting workstation, we have released or redeployed our transcriptionists. Reports are available for immediate review, editing, and signing off. Reporting and approval is now a single process. Removing the multiple workflow steps and bottlenecks has resulted in faster report turnaround times.


Based on our experience in implementing speech recognition technology, we have several recommendations for selecting a system. Our suggestions will, hopefully, prove useful to others about to embark on the same journey.

A speech recognition system must function as part of the RIS/PACS. A single login should launch all three applications simultaneously. The three should all run on the same workstation, enabling a seamless flow of information. A drawback to integration of the speech recognition module into the entire system, however, is that a failure in the speech recognition software or hardware may prevent use of the RIS for reporting.

While speech recognition should be encouraged as the main method of text input, staff should also have the choice of either typing text in directly (and preferably formatting it as well) or using a digital dictation mode that allows a transcriptionist to perform the translation, although the latter will remove many of the system's efficiency gains.

The system should include standard productivity tools. These might include spell checking with a language dictionary of choice and user-specific private reporting templates as well as user-wide access to standard reporting templates. A macro and template input capability is mandatory, as it can increase reporting speed even for radiologists who achieve 100% accuracy in speech recognition. The function should be available via text input or voice input. Pressing a button, clicking a mouse, or using a key combination should convert the macro text into the full report template. Proofreading is then unnecessary, and reports can be signed almost immediately.

Proponents of speech recognition disagree whether handheld microphones or boom-type headsets are preferable, but any good-quality, noise-canceling microphone should be sufficient. A handheld microphone can integrate many other functions into a single device through shortcut keys that perform a variety of navigation, control, and reporting functions. Handheld microphones can also incorporate barcode scanners.

A headset microphone, on the other hand, leaves both hands free for other tasks such as handling a computer mouse, picking up a requisition form, and perhaps operating a separate barcode scanner. And a headset keeps the microphone at a constant distance from the speaker's mouth, which prevents potential translation mistakes when the speaker moves to look at images and causes fluctuations in the sound volume reaching the microphone.

Regardless of microphone choice, inclusion of voice-activated commands should help speed up the workflow. Using this function, users can more seamlessly navigate the system.

All major vendors of speech recognition systems support natural language processing, which lets users speak in a more natural rhythm. This method relies on recognition of strings of "phonemes" instead of individual words. Phonemes are basic units of sound that distinguish words from each other. The English language has about 40 to 45 phonemes. Their use in speech recognition systems avoids the struggle with varied pronunciations of several thousand words. Phrases can be recognized more easily and translated more accurately. Corrections should, as far as possible, be made by phrases to further improve the system's accuracy.

Radiologists are advised to diligently complete the training or startup modules in the speech recognition software, as they form the basis of each user's speech profile. Those who bypass these learning modules will find their speech accuracy low and the system frustrating to use. Users should also be aware that all systems have certain idiosyncrasies, and they may need to pronounce individual words in an apparently nonintuitive fashion to ensure accuracy.

Appropriate reading room design is another important consideration. Although noise-canceling microphones substantially reduce background noise during dictation, individual reporting stations must be separated another by physical barriers. Partitions are probably the best compromise, given that few centers will be able to provide soundproof rooms for each RIS/ PACS workstation.

Telephones or intercom systems can be muted or even removed completely from communal reporting rooms. Routine calibration of microphones is essential, especially if workstations are moved to another location where the acoustic environment is different.


The pronunciation of words varies greatly among the English-speaking countries. Although speech recognition software can generally adapt intelligently with use, it is wise to ensure that installed language models and dictionaries match the user.

English speakers have a choice between U.K. and U.S. models. Members of our department, who are neither British nor North American, found that they had to alter their intonation and accent when reporting. This is clearly not ideal.

Speech recognition is not limited to the English-speaking market, however. Many systems are implemented in Asia and Europe. Direct comparison of translation accuracy between alternative language models is difficult in practice, but information is available on this topic. The U.S. National Institute of Standards and Technology (NIST) has conducted a yearly evaluation of language detection and translation since 1996. NIST's 2003 report noted that error rates were lowest for English and Vietnamese, and most other languages tested, including Tamil, Korean, Mandarin, and Japanese, also fared well.

Speech recognition technology can shift radiologists' attention back where it belongs: looking at and interpreting radiological images. The latest software packages contain enhancements such as macros and templates that further improve the speed and accuracy of reports. Voice-driven commands help reduce the need to switch between various computer input devices. Availability of this technology is not limited to English-speaking or European countries; speakers of major Asian languages can benefit as well.

Suggested Reading

Harisinghani MG, Blake MA, Saksena M, et al. Importance and effects of altered workplace ergonomics in modern radiology suites. Radiographics 2004;24(2):615-627.

Houston JD, Rupp FW. Experience with implementation of a radiology speech recognition system. J Digit Imaging 2000;13(3):124-128.

Jones A. Making speech recognition work.

Langer SG. Impact of speech recognition on radiologist productivity. J Digit Imaging 2002;15(4):203-209.

Martin AF, Przybocki MA. NIST 2003 language recognition evaluation.

Mehta A, McLoud TC. Voice recognition. J Thorac Imaging 2003;18(3):178-182.

Reiner BI, Siegel EL, Weiss DL. SCAR University primer 4: Electronic reporting in the digital medical enterprise. Great Falls, Virginia: Society for Computer Applications in Radiology, 2003.

Schweitzer A. Voice recognition in radiology: a technology overview. Radiol Manag 2001;23(1):41-49

Viau M. What works. Dictate this. Speech recognition brings workflow improvements and cost savings to a Florida-based hospital. Health Manag Technol 2002;23(11):50, 54.

Dr. Goh is a consultant radiologist and PACS-RIS manager, and Dr. Tsou is a consultant radiologist, both at Tan Tock Seng Hospital in Singapore.