Topics:

Speech Recognition Technology — Finally Ready for Prime Time?

Speech Recognition Technology — Finally Ready for Prime Time?

With the release of the iPhone 4S and Siri, Apple has introduced speech recognition (SR) technology to the masses. Apple bills and markets Siri as a “humble personal assistant.” However, I doubt many radiologists, who have been working with SR technology for multiple years, would describe their SR software systems this way.

Radiologists who use SR software to dictate their reports often describe the experience as frustrating. Much of this frustration, I believe, stems from the introduction of SR before its maturity.

Apple is renowned for their elegant consumer gadgets. They delay release of their products until their engineers believe that a technology has improved to a point where its implementation will not cause undue frustration on the part of their users. Case in point, while the tablet computer existed long before the iPad, Jobs and Co. at Apple realized that it would not be readily embraced by the average user until the product could be very thin, very responsive, and light. Apparently the “magic” (to borrow a term Jobs liked to throw around) year when technology was robust enough to accomplish this was 2010.

Now that Apple has determined SR technology to be mature enough to incorporate it into its iconic iPhone, should radiologists stop their grumbling?

Speech recognition software, which facilitates radiology dictations, has many proven benefits. Last year in AJR I wrote about our experience at UNC Hospitals incorporating SR software into the radiologist’s workflow. We noticed substantial improvements in report turn around time similar to what other observers had demonstrated, but also found that improvements in turn around time varied significantly between users due predominately to different work habits. Users who spent greater time training the system to recognize their voice and adding and revising terms in the vocabulary library, for example, benefited the most.

If you read between the lines of Apple’s marketing onslaught you will realize that they advise the same type of behavior to improve Siri (the more you use it, the better it will get). Additional benefits of SR driven dictation systems include the ability to leverage templates, mini macros, and structured reporting to improve accuracy, consistency, and billing. Moreover, advances in natural language processing coupled to SR software may allow for the ability to integrate targeted radiology decision support at the time of dictation, the ability to structure unstructured reports, and identify key content and highlight critical findings.

However, despite these benefits SR software has potential drawbacks. Most notably errors can result from incorrect capture of a radiologist’s voice and more important, the intent of their language. Most recently Sarah Basma and her group at Women’s College Hospital in Toronto, found an alarmingly high rate of errors using SR software vis a vis traditional transcriptionists for breast imaging reports. After correcting for multiple variables Basma’s group found that reports generated with SR were eight times more likely to contain major errors than those generated by traditional transcription.

The conflicting reports on SR software should give us pause as we move forward. The companies producing SR software have improved their products significantly over the last few years. Despite these improvements, a recent Diagnostic Imaging poll shows that while 80 percent of users use SR software, 30 percent still find it hard to use and not as accurate as traditional transcription services.

Apple believes the technology is ready for prime time. However as in most things tech, what is good enough for the average consumer product doesn’t necessarily stand up to the rigorous standards of healthcare. If Siri misinterprets your request for the weather forecast, you may get rained on. However, if a radiologist’s breast imaging report is recorded incorrectly the potential consequences are much more substantial.

Disclosures

I would disagree wholeheartedly with the last comment. As a radiologist who has been using speech recognition software for dictating radiology reports for four years and has been diligently "supervising"my work and correcting reports to the best of my abilities, I have to say that the process is still very tedious and painful, requiring many extra hours of work on my part that were not necessary when a transcriptionist typed my dictation. I believe this extra editing, reading over reports two and three times, yet still sometimes overlooking some pretty embarrasing mistakes, cancels out any increased productivity rendered by speech recognition technology. Plus, the program I use does not have "on the fly" abilities to correct a persistently misrecognized word; these words have to be sent in to a superuser to enter into the program for speech adaptation, and even after this happens, there is very little improvement. Not all speech recognition software is created equal. These programs are supposed to learn when corrections are made, but this is clearly not happening in many instances
.
Fortunately, in our hospital, because of software incompatibilities with our PACS, we are still using "back end" dictation for mammography, and I would venture to say that turnaround time in that department is awesome without "front end" dictation, as the report goes in without hesitation for correcting every missed word, is transcribed and edited by a transcriptionist, and is returned for sign-out within minutes. Perhaps this should be the way speech recognition is best utilized.
Carole Roseland (not verified) @
As the healthcare industry scrambles to meet meaningful use and be sure we get things into an electronic format, technology plays an ever important role. What concerns me is very little is being done to assure that what is put into the patient's record with these systems is actually accurate data. The MT Inner Circle completed a study in the past few months about what is seen every day in errors in clinical documentation. Allowing speech recognized documentation to go directly into the record without any intervention from a qualified editor can result in some pretty awful things, and some of these things could cause harm to the patient. You can find a summary of that study here: http://mtinnercircle.com/2011/10/07/medical-transcription-identifying-errors-protecting-patients/

I hope we find ways to use technology that improves the safety of patient care, creates more efficiency, and still remembers that the patient is the north star in health care and what we get into that documentation needs to be correct.
Kathy Nicholls (not verified) @
I have used Dragon for many years, currently on 11.5 with only self editing available for me. However I firmly believe the welfare of the patient demands that for the near future all administrators should be required to offer each radiologist or other physician the option of "Back End " editing by aprofessional transcriptionist implyingt hat they cansimultaneously see andhear each work andmake correctionsas needed. The benefits in speed and cost are sould still be largely there. I am axious to see SIRI trials for rad reports.

Morton I. Goldstein MD
MORTON GOLDSTEIN (not verified) @
Errors in patients' reports, as Basma and his colleagues point out are not a simple by-product of speech software (or any single tool); it is indicative of a broken process where editing and review is not happening as it should. Technologies, such as speech recognition, can and should be used to help clinicians document care and should be used responsibly, as part of a well-managed clinical documentation process.

Medical speech recognition is one of many technologies that clinicians use as part of the care delivery process; it is considered a "supervised tool"that is special purposed for increasing physician productivity. Rather than typing or waiting for a transcriptionist to turn-around a medical-report, speech recognition enables physicians to speak into a microphone, see their speech turned into text on the screen, and review, modify/edit accordingly. "Supervision" is key because the "review, modify/edit" part of the workflow is a critical quality assurance dimension that helps to ensure accurate and high-quality clinical documentation.

Speech recognition accuracy has never promised 100 percent (unsupervised) accuracy and for this reason "supervision" is an institutionalized best practice leveraged by hundreds of thousands of physicians nationwide that use "speech" to produce documentation. If the "supervision" piece of the workflow is missing, inadequate, or not effective the result can be poor quality documentation.
Holly Dewar (not verified) @
Click here to close