Voice Recognition: Taming the Beast

April 27, 2012

To sum up my previous thoughts regarding the state of voice recognition typically available to us in radiology: We’re not yet living in the days of Star Trek.

To sum up my previous thoughts regarding the state of voice recognition typically available to us in radiology: We’re not yet living in the days of Star Trek.

Assume your options for addressing this do not include dropping a small fortune on new, much more accurate software that flawlessly transcribes what you say, corrects your grammar, and actually learns from you (rather than claiming it does while making the same stupid mistakes over and aggravatingly over). Nor scrapping VR and going back to human transcriptionists, typing reports yourself, or chiseling them on a stone tablet.

Tempting as it may be, I do not suggest the tack many seem to have adopted: Dictating with about 10 percent of your attention focused on the text as you interpret your images, then giving a five-second glance to the finished page-long product before hitting “sign.”

Sure, you probably won’t make too many egregious goofs. On the other hand, are you okay with the embarrassment of occasionally reporting that the left ovary is 3.4 x 2.3 x 80.2 cm? Or that there is hydronephrosis with perinephrectomy edema? How about having your impression of “No acute pathology” quietly being turned into just “Acute pathology,” and fielding the ensuing phone calls?

No, you want to pay proper attention to this stuff, and you don’t want to slow to a snail’s pace while doing it. So you obediently recalibrate your microphone, keep a quiet workspace while you dictate, etc. But that only goes so far.

I have found that, rather than trying to appease the machine in the hope that it will finally give you what you want, you are best served by giving it as little opportunity as possible to mess with you. Simply put, the less you say, the less of a target it has. Sort of like if you’re being deposed.

Most users of VR know that macros are a key means of doing this. That is, if you know that you routinely refer to three-pixel lesions as “nonspecific but statistically benign,” you stop repeating that phrase every time you see such findings. Instead, you say something like “MACRO tiny,” and your catchphrase obediently flows onto the screen. Fewer words spoken means fewer potential errors.

For a long time, I didn’t make proper use of this. I had a few stock phrases, and macros for my report templates (a header for Clinical History, another for Technical Factors, etc.). I had the Fleischner Society guidelines for whenever my study included pulmonary nodules. But I was still being driven to distraction by the other 95 percent of what I dictated.

Then I started seeing some reports in the form of checklists, for lack of a better term. Rather than organized in prose-like paragraphs (one covering from the liver through the kidneys, the next covering the bowel, vasculature, nodes, and so on), these reports went line by line (Liver: [findings] / Spleen: [findings] / Pancreas: [findings]). Initially, this offended me. Possibly it was because I’m something of a writer, but I felt like style was lost in turning paragraphs into bare-bones lists.

And yet, when I tried it, I experienced much less of a need for editing. Rather than dictating “the liver is normal in size, configuration, and attenuation,” for instance, all I had to do was position the cursor after the Liver heading (or voice-command it: “insert after liver”), and dictate “normal.” Even quicker, I could just include the normal in my report template, and only displace it with dictated findings if there was any abnormality to report. Any stylistic objection I had to this new format was brushed aside as my productivity with the same old error-prone software soared.

The vast majority of transcription mistakes which continue to slow me down consist of a small rogues’ gallery of words and phrases that the VR software just doesn’t seem to get. This makes it all the more maddening when the software fails to learn from its mistakes; theoretically, it should only take a couple of rounds of “correcting” errors before they go away. For such synthetic learning disabilities, I’ve come up with two workarounds.

One: If the erroneous word is one you never actually use in your reports, like somebody’s name, a type of surgical hardware, or a body part that doesn’t show up in your subspecialty, go to your VR’s dictionary and delete the offender. As far as the software is concerned, that word no longer exists, and it can no longer use the word to torment you. Don’t worry, if you wind up having to actually use the deleted term once in the next decade, you can still type it into your report. Alarms will not go off.

Two: If deletion is not an option, train the word or phrase you want transcribed either as a new macro, or (if you don’t feel like saying “macro” every time you want to dictate it), a new, nonsense-word that you add to the VR’s dictionary. For instance, I once had a steady stream of arterial Dopplers in my worklist, but the VR program refused to transcribe cm/s for me. It would haphazardly spell out centimeters, seconds, or both. Sometimes it would change the slash to the word “per.” I wanted uniformity. So, I added the nonsense-word “zap” into the dictionary, such that any time I said zap, “cm/s” appeared on the screen instead.

I’ve since found that not all nonsense words are created equal. For instance, “zap” can easily get transcribed as “that.” Adding syllables helps, and sticking to unusual sounds minimizes the chance of sounding like something else in the dictionary. My personal favorite is Kalamazoo, which I now use hundreds of times per work shift. Michiganites and Dr. Seuss fans, take note.