Speech recognition need not slow reporting time

Accuracy depends on correct use of the microphone, language model software, and dictation style

By: David L. Weiss, M.D.

Although speech recognition (SR) for radiology dictation has become more mainstream over the last several years, some radiologists report increasing dictation time compared with conventional transcription.1-3 A number of users also express concern about distraction during dictation and the possibility that this may decrease interpretation accuracy. While not all users will achieve high accuracy rates and efficient system use, a time penalty need not be inevitable for the majority of radiologists using SR.4,5

Accuracy depends in large part on correct use of the microphone, the language model software, and the radiologist's dictation style.

  • Microphone. A noise-canceling microphone is best and has become the industry standard.6 Most radiologists move their head when viewing images. If the microphone is handheld, the distance from the mouth may vary. A headset microphone eliminates inconsistencies in position that contribute to diminished accuracy. Using the speech engine software, the volume setting and sound card test should be repeated each time the microphone is changed and at regular intervals even in the absence of hardware changes.

  • Language model. The modern speech engine-the software that converts the spoken word to text-no longer attempts to recognize individual words. Rather, it divides the English language into sounds called phonemes. The software attempts to recognize consecutive strings of phonemes based on a statistical model.7 Understanding this process is important for optimal dictation using SR. Dictating phrases rather than individual words and using a flowing dictation style rather than a choppy one can yield startling increases in accuracy. Corrections of phrases rather than individual words also provides better results.

  • Dictation style. The language model is structured to create a document in the style of a radiology report, using medical and radiologic terminology. A user who conforms to this format will have an easier time with the system than one who dictates in lengthy prose. The speech engine tracks the dictation and pronunciation style of individual users and can update a user's profile at the end of each dictation session. The user profiles of radiologists who are consistent in their dictation style can contribute to higher accuracy rates.

    Users quickly discover a number of minor idiosyncrasies of SR. Most words can be dictated rapidly and without pauses between them if they are enunciated clearly, but certain words and phrases need more careful diction. Many users with high accuracy may not even be aware that they are conforming to these required slight pauses.

    The system favors a somewhat unusual pronunciation of certain words. In these cases, it is often easier to change the way a word is pronounced than to try to teach the system your preference. An example of this is the word "calculi." The system prefers that it be pronounced "kal-kuew-lee" rather than "kal-kuew-lie."8 And it has trouble recognizing a few words even after repeated training. An example is the word "gout"; users find it easier to use the phrase "gouty arthritis" than to have to make corrections repeatedly.

    CORRECTIONS

    Corrections can be made in three basic ways, each with different implications. The quickest and easiest is simply to highlight a word or phrase and respeak it. The system will incorporate this change into the user profile at the end of the dictation session. A better correction method is to use the correction dialog box and the "train" command to teach the system a particular pronunciation of a word. This increases the speed at which the system learns and creates a steeper rise in accuracy.

    A third and even more powerful method is to use the vocabulary editor to add words or phrases to the system dictionary or to remove them. Access to this feature may differ among different vendors' products. Making the last two types of corrections during early use should result in greater accuracy rates, especially for non-native speakers.

    MACROS AND TEMPLATES

    Even a user with 100% accuracy will not achieve the maximum efficiency gains of SR without using macros and templates.9 Macros are stored reports that can be easily recalled with voice commands or a mouseclick. Templates are simply macros with built-in blanks where numbers or variable phrases can be added by dictation. The use of macros not only saves time in dictation but also in proofreading, which becomes unnecessary when an unmodified macro is used.

    Depending on the modality and institution, use of macros in their original unmodified state is unusual, however. A macro will more likely be recalled and changed slightly in the final dictation. Macros should be created in a way that makes it simple and easy to find a sentence or phrase that needs a change.

    The creation and storage of a large number of macros does not seem to cause performance deterioration. It is therefore possible to use hundreds of macros; the only limitation is remembering the various macro names.

    It is helpful to create a logical and systematic method for naming and organizing macros. One such scheme uses a hierarchical naming system that moves from less to most specific description. Each macro is named in the following order: modality, body part, modifier, side of body. This may vary slightly from macro to macro but will generally allow the user to simply describe a macro for recall: "CT scan, abdomen and pelvis, with contrast, female," for example; or "ultrasound, DVT, normal, right." Whatever naming system is used, it should be logical and consistent across all modalities and macros.

    Two basic tenets will help improve speed and efficiency. The first is to minimize the time-consuming physical and mental tasks necessary for creating a report. These include not only the actual dictation but also the time spent in navigational tasks. The second principle is to arrange the necessary tasks so as to minimize visual keyboard or screen icon search. Doing so allows the radiologist's eyes to remain on the images where they belong.

    NAVIGATION

    One important navigational tool is the programmable button set on the handheld Philips speech microphone.10 This functionality can easily be combined with use of the recommended headset microphone. The buttons should be programmed with the most commonly used commands, such as "start/stop dictation," "sign off," "delete," and "enter," but choice of commands varies among users (Figure 1). Experimentation to find the right combination and configuration for microphone buttons will result in more rapid and intuitive use during dictation. This can minimize the mental energy needed for navigation, which can be redirected toward greater concentration on image interpretation.

    Taking time to explore different ways of navigating pays off in easier system use. Commands such as "insert before" or "select next sentence" make navigation quicker. A list of available commands can be accessed on the fly, using the "what can I say" command.

    Coordinating the SR system with a PACS or radiology information system decreases the need for many navigational commands. Passing the accession number from the RIS to SR eliminates data entry and minimizes human error. The integration of PACS and SR allows the press of a single button to sign off a radiology report and simultaneously launch the next case on the PACS work list.11

    A number of companies are developing structured reporting, a method of radiology reporting that uses pull-down menus with a point-and-click system to create radiology reports. Many radiologists already use this technology for modalities such as mammography, and in some cases it is being implemented department-wide. It should become more popular as the radiology lexicon becomes more standardized.12 Structured reporting also allows automated coding and billing and is ideally suited for data mining.

    The disadvantages are similar to those of speech recognition and include a possible time penalty and a potential distraction from viewing images. With a point-and-click system, the radiologist's eyes must move from the images for a period of time, although the addition of voice activation alleviates this problem. Structured reporting may also limit the creativity of radiology dictation; some radiologists view this as a negative, but others consider it quite the opposite.

    The ultimate efficient reporting system may comprise a combination of speech recognition and structured reporting. A number of vendors are considering ways to take advantage of the potential synergy of the two methods.

    Despite the use of speech recognition, radiology reporting methods have changed little in the last 100 years.13 A radiologist typically views images in the traditional manner while narrating a descriptive prose report. Both speech recognition and structured reporting can be integrated with current image interpretation software to import data into predefined templates and present the radiologist with a complete or near-complete report for review and finalization.

    In the simplest of cases, patient history, imaging technique, and the date of a prior study could be passed from the RIS, modality, and PACS to the correct field in the report (Figure 2).

    In more sophisticated circumstances, measurements made by the technologist or radiologist could be imported. These could be compared with measurements from the prior study and changes automatically inserted into the report. Fetal ultrasound imaging is ideally suited for this type of reporting (Figure 3). At the next level, computer-assisted diagnosis (CAD) data could be presented for the radiologist's approval. Once accepted, the appropriate text report would be automatically presented for review and finalization.

    MULTIMEDIA REPORTING

    Multimedia reporting, another feature under development, is accomplished by linking a word or phrase within a report to the appropriate digital image being described. The clinician is presented with a text report in which the hyperlinked text is shown in a different font. A thumbnail image can be included in the report as well. Double-clicking on the hyperlinked word or the thumbnail opens the full-resolution image with the appropriate saved annotations. Clinicians need only download the images they wish to review.

    Information contained in the report can be exported back to the RIS for coding and billing purposes as well as data mining. In a basic example, the BI-RADS category assigned by the radiologist during mammography interpretation can be used by the RIS for tracking purposes and to generate a letter to the patient. Structured reporting is ideally suited to perform these functions, but some speech recognition systems can be configured for this as well.

    The reported time penalty in the use of speech recognition need not be a given. Accuracy can be increased for most users by applying the techniques described above. The liberal use of macros and templates eliminates time spent in dictation and proofreading. Streamlining navigational tasks using system features, programmable microphone buttons, and coordination with PACS and RIS saves further time and mental energy. The combination of speech recognition and structured reporting uses the best features of each system and minimizes their respective disadvantages. Sharing data between image interpretation software and the reporting system further maximizes the efficiency and speed of reporting and should provide clinicians with an improved radiology report.

    DR. WEISS is clinical section head of imaging informatics at Geisinger Medical Center in Danville, PA. This article is based on a presentation at the 2003 Symposium for Computer Applications in Radiology. Dr. Weiss is a consultant for Agfa/Talk Technology.

    References

    1. Hayt DB, Alexander S. The pros and cons of implementing PACS and speech recognition systems. J Digital Imaging 2001;14:149-157.

    2. Gale B, Safriel Y, Lukban A, et al. Radiology report production times: voice recognition vs. transcription. Radiol Management 2001;23:18-22.

    3. Hundt W, Stark O, Scharnberg B, et al. Speech processing in radiology. Eur Radiol 1999;9:1451-1456.

    4. Langer S. Impact of tightly coupled PACS/speech recognition on report turnaround time in the radiology department. J Digital Imaging 2002;15(suppl):234-236.

    5. Langer S. Impact of speech recognition on radiologist productivity. SCAR University syllabus 2002;154-159.

    6. Wickstom TK. Gradient headset microphones-history and performance. Emkay Innovative Products 2002 Jan. 24. Article available online at www.arraymicrophones.com/html/wp_8_99 .htm.

    7. Makhoul J, Schwartz R. State of the art in continuous speech recognition. Proc Natl Acad Sci USA 1995;92:9956-9963.

    8. Weiss DL. Speech recognition: evaluation, planning, installation, and use: purchasing a speech recognition system. In: Reiner B, Siegel E, Weiss D, eds. Electronic reporting in the digital medical enterprise. Great Falls, VA: Society for Computer Applications in Radiology 2003:27-42.

    9. Sistrom CL, Honeyman JC, Mancuso A, Quisling RG. Managing predefined templates and macros for a departmental speech recognition system using common software. J Digital Imaging 2001;14:131-141.

    10. Weiss DL. Speech recognition technology improves workflow efficiency. Diagnostic Imaging 2002;24:48-51.

    11. Weiss DL, Hoffman J, Kustas G. Integrated voice recognition and picture archiving and communications system: development and early experience. J Digital Imaging 2001;14(suppl 1):233-235.

    12. Langlotz C. Automatic structuring of radiology reports: harbinger of a second information revolution in radiology. Radiology 2002;224:5-7.

    13. Reiner BI, Siegel EL, Shastri K. The future of radiology reporting. In: Reiner B, Siegel E, Weiss D, eds. Electronic reporting in the digital medical enterprise. Great Falls, VA: Society for Computer Applications in Radiology 2003:83-104.


    > CLOSE WINDOW