What They Didn't Tell You About Voice Recognition Software

March 7, 2022

Article

Promoted as a helpful tool for increasing efficiency, voice recognition can be an endless source of frustration for radiologists.

Long-term readers of this column may recall that I’m not a cheerleader for voice recognition. I recognize it as a necessary evil, but I do miss transcriptionists.

I initially encountered voice recognition (VR) during fellowship. My first couple of jobs subsequent to that were holdouts, but it was only a few years later that even those employers ditched their tape recorders. I’ve been stuck with Dragon machines ever since.

One thing they proudly say to folks being trained to use voice recoginition —and, indeed, to sell VR in the first place—is that the software learns how you speak. It adapts to you so its performance will improve over time.

For an awful lot of us rads, that turns out to be an empty promise (or a big fat lie for those particularly cross about it). The software makes the same stupid mistakes with just as much frequency in our 10th, even 20th, year of use as it did during the first month.

We tie ourselves in knots trying to make it work better. We’re told to rerun the “Audio Wizard” to ensure our microphones are working well. There is also the “General training” of grammatically-butchered sample radiology reports. These are time wasting efforts that annoy us without evident payoff until we get sick of running in circles, and stop even trying to get help. I suppose support people, no longer having to field our calls, might consider that a good outcome.

I have actually noticed that if I stay at a job long enough, I can detect slow/steady deterioration of my VR’s performance. It’s not just failing to learn me. It is unlearning. Some other rads have confirmed that they have had the same experience. For a while, I just thought my microphones and headsets weren’t aging well, and tried replacing them. Unfortunately, that never helped in the long run.

Yes, there are some fortunate folks out there (including some of you readers) who feel like the tech works just fine for them. I’m sure there are others who just shrug and use a million macros so the VR has as little opportunity as possible to mess with their words.

For everyone else, allow me to share a little wisdom I gained during training for my new job. The PowerScribe guy just happened to mention something that blew my mind. I already shared it on social media because I feel everybody should be told about this. If I can incrementally improve even a few rads’ lives, it might be one of the best accomplishments of my career.

What was the PowerScribe guy’s advice? Don’t use the backspace or delete buttons on your keyboard in your dictated reports. Ever. Instead, highlight the word(s) you want to get rid of and dictate over them, or use voice commands like “delete that.”

Why? Well, evidently a big chunk of the “learning” that VR does happens when you sign your report. The software compares its audio recording of what you said against the printed words on your screen. So, whenever you use the keyboard to delete things, the software has audio of you saying stuff but has no writing to go along with it. The mismatch does a little damage to your speech recognition profile. Based on what he said, I also wonder whether manually typing stuff in your report without giving the software audio input might do similar harm.

Multiply such damaging incidents by a hundred reports per day, and it is easy to imagine your software profile getting seriously corrupted in time. That corruption will impair its performance for you until you get so fed up that you scrap the profile and start over with the annoying “Is this really doing anything?” training routine for the umpteenth time.

I have no idea why it works like this. Off the top of my head, I can think of at least three ways keyboard usage doesn’t need to be self-destructive behavior. I can nevertheless live with the notion that VR programmers, savvier than I, had good reasons for doing it this way, at least initially. It’s harder for me to imagine that subsequent generations of voice recognition engineers really tried to find a better way of doing things, yet came up empty.

Absolutely no good reason exists, however, for this to be the way things roll without VR trainers, support, and even vendors telling the users about it. The fact that I (and most of the other rads reacting to my post on social media) have gone through multiple rounds of training and troubleshooting, and am only now, 18 years into my voice recognition life, hearing about it? Inexcusable.

I cannot believe that the majority of VR training/support folks would know about this and choose not to mention it to users, especially users complaining of steadily degrading performance. Sure, some VR people might be sloppy or uncaring, but such a widespread deficiency tells me that most of them don’t know about this either.

Maybe it’s not in their own training or periodic refresher courses. Maybe it is mentioned in passing and given far too little emphasis. For something so fundamental to the software’s “learning” ability, it should be one of the top-ten bullet points, repeated over and over, so there’s zero chance it will slip through anyone’s mental cracks. While they are at it, their programmers should be given a priority one directive for future updates: Find a way to re-enable users to employ the keyboard without sabotaging themselves!

In the meantime, since I’ve been typing for almost my entire life and old habits die hard, I put a small piece of “painter’s tape” (masking would work also) on the backspace and delete keys. The sight, even the feel of the tape, will hopefully keep me from transgressing ever again.

Stay at the forefront of radiology with the Diagnostic Imaging newsletter, delivering the latest news, clinical insights, and imaging advancements for today’s radiologists.

Subscribe Now!

What They Didn't Tell You About Voice Recognition Software

Newsletter