Speech Therapy, Radiology-style


Who adapts to whom in voice recognition?

I’m told, as a young lad, I was once sent to a speech therapist to evaluate why, as my grandmother put it in her Brooklynese, I “tawked like an outta towneh.” At the end of the evaluation, the therapist concluded that nothing was amiss; I had a perfectly normal Midwestern accent.

I don’t know if anyone had the heart to tell her that none of our family hailed from, nor had ever lived in, the Midwest. The matter was pursued no further, and it wasn’t too many years before, instead, casual observers thought they detected a bit of British about me instead. I credit that to an upbringing rich in the likes of Monty Python, Fawlty Towers, etc.

While this may be evidence that accents are in the ear of the beholder, it’s been abundantly proven that we adapt our speech to our circumstances. We constantly hear what we say, and make adjustments to how we say it. (A reason why hearing loss can impact the way one speaks.) Most of this is on a less-than-conscious level.

That changes when one is being taught how to speak differently, such as by a speech therapist, an elocution instructor…or voice recognition software.

Yes, yes, I know…we’re been told, since voice rec became a thing, that it’s a technological bit of wizardry, the software “learns” as we go along, and it adapts to our way of speaking to result in better and better performance. Well, it’s been over a decade since I’ve been using voice rec, and presumably there have been more than a couple of upgrades during that interval…but it seems to me we humans are the ones doing most of the learning and adapting.

We learn, for instance, to use macros and templates, since they give the software less of an opportunity to mess with what we want transcribed.

I’ve also noticed that there are certain words and phrases that I have simply taught myself not to use…because, no matter how much software “retraining” I try, the voice rec just cannot get them right. Probably my earliest example was “soft-tissue swelling.” For whatever reason, if I dictated “soft hyphen tissue swelling” at my usual cadence, it would invariably come out as “soft tissue tissue swelling.”[[{"type":"media","view_mode":"media_crop","fid":"60430","attributes":{"alt":"Learning in voice recognition","class":"media-image media-image-right","id":"media_crop_5792206290505","media_crop_h":"0","media_crop_image_style":"-1","media_crop_instance":"7649","media_crop_rotate":"0","media_crop_scale_h":"0","media_crop_scale_w":"0","media_crop_w":"0","media_crop_x":"0","media_crop_y":"0","style":"float: right;","title":"©CoraMax/Shutterstock.com","typeof":"foaf:Image"}}]]

If I really cared about a never-transcribed-right word or phrase, I might make it a macro. For instance, the word “pathology.” You’d think that software aimed at diagnostic professionals would bend over backwards to make sure it could handle this one, wouldn’t you? Nope. I now have macros for just about every phrase I use with that word, such as “No new pathology,” because voice rec really, really wants to substitute terms like “patella” instead.

The remodeling of my speech runs deeper than that, however. I’ve actually altered the way I say certain words, accentuating sounds so the software will get it right, to the point that it bleeds over into my speech with other humans. Understandable, since I tend to dictate these terms hundreds, if not thousands, of times per week. My “than” no longer bears a remote resemblance to my “then” (the “a” now gets overemphasized to a borderline obnoxious degree). My “for” is now more of a “fer,” since the software otherwise shoehorns the number four into the darnedest places. My “to” is now a “tuh,” lest it become “too” or “two.” There are now a handful of terms where I accentuate weird syllables to ensure compliance from the software, like “abdominOHpelvic” or “thoracOHlumbar” lest those words get split into two.

I’ve even gotten accustomed to dropping articles, like “the,” “a,” “an,” etc. from my speech patterns, because the software just plays havoc with this stuff. Yes, I miss communicating in complete sentences as my erstwhile English teachers had so carefully taught me to do, but once again, fewer dictated words means less input for the software to screw up. For those of you who watched the Spartacus cable TV series, yes, I do find my article impoverished speech reminiscent of the characters’ dialogue.

Perhaps the voice recognition industry, at some point, will abandon its pretense of their products conforming to our needs, and instead start offering courses similar to the “accent reduction” classes that immigrants have found useful. If they don’t, someone else probably will.

Think about it: Suppose someone guaranteed you that their course could improve your voice recognition usage such that your daily productivity got boosted by, say, a humble 5%. For a decently productive rad, that would be more than enough of a gain to justify forking over a few hundred bucks in tuition.

Related Videos
Nina Kottler, MD, MS
The Executive Order on AI: Promising Development for Radiology or ‘HIPAA for AI’?
Related Content
© 2023 MJH Life Sciences

All rights reserved.