Speech-recognition companies hope radiologists embrace new products

September 25, 1996

IBM's debut of MedSpeak puts pressure on smaller firms

One of radiology's last bottlenecks is transcription. Radiologydepartments can spend anywhere from several hours to several daysrouting an exam interpretation from a radiologist to a transcriptionistand back in a lengthy process sure to draw the wrath of your averagemanaged-care administrator.

Speech-recognition technology is one way to address the transcriptionbottleneck. By employing computers that digitally convert thehuman voice into text on a computer screen, speech recognitionholds the promise of better healthcare at a lower price.

It's a great idea on paper, but it hasn't worked in practice.Speech-recognition systems have made a minimal penetration ofthe radiology market, according to Marc Fine, president of speech-recognitiondeveloper Voice Activated Systems Technologies.

"The market has not even begun to be penetrated," Finesaid. "It's an enormous opportunity."

The problem is that until recently, speech-recognition systemshave been cumbersome to use and have forced radiologists to changetheir reading habits, according to Fine and other market watchers.Some of these systems require radiologists to use discrete speech,in which the speaker must talk at a slower speed, emphasizingeach word deliberately, to prevent the computer from confusingwords.

Increases in computer hardware power, however, are helping speech-recognitioncompanies write more sophisticated software codes that overcomethe limitations of discrete speech with products that allow usersto speak more naturally. A showdown is looming at this year'sRadiological Society of North America meeting in December, whereseveral new speech-recognition products will be introduced.

Beyond discrete speech. One such product will be MedSpeak/Radiologyfor Windows NT, a speech-recognition product developed by IBMthat debuted this month. IBM claims that MedSpeak/Radiology isthe first commercial product to recognize and process continuousspeech in real-time, in which the speaker talks at his or hernormal rate and the computer translates that speech into textalmost instantaneously.

By recognizing continuous speech, and by converting that speechinto digitized text in real-time, MedSpeak will allow radiologiststo realize the benefits of speech recognition without forcingthem to change their reading habits, according to David Wholley,market development manager for speech-recognition systems at thecompany's IBM Healthcare Solutions unit in Hawthorne, NY.

IBM developed MedSpeak after receiving a lukewarm reception forits discrete product for radiology, VoiceType 3.0, which has beenon the market for several years.

"We found that most radiologists are not willing (to usediscrete speech)," Wholley said. "They have said, whetherit is our discrete product or Kurzweil's or someone else's, `We'regoing to wait for continuous.'

And the news is, it's here."

MedSpeak employs speech-recognition algorithms called trigrams,which analyze words in groups of three. Trigrams enable MedSpeakto use not only the phonetic profile of a word but also the contextin which it appears to determine its meaning. This helps the programdifferentiate between words and symbols that sound alike, suchas "to," "two," and "too.

MedSpeak's algorithms are also more powerful than VoiceType's,and the new product has a richer set of the data files that containthe acoustic information, language model information, and otherdata that are used to convert sounds to digital text.

IBM claims that MedSpeak has a 95% accuracy rating in recognizingspeech of native English speakers with North American accents.English speakers with foreign accents can participate in a shorttraining program that will boost the system's accuracy in recognizingtheir speech to close to 95%, Wholley said.

MedSpeak/Radiology's performance comes at a price, however. IBM'slist price for a MedSpeak software license with a handheld microphoneis $4495, compared with about $1200 for a radiology version ofVoiceType 3.0. In addition, MedSpeak/Radiology requires the useof a powerful Pentium Pro PC with a 200-MHz processor runningWindows NT, a configuration that can cost at least $3500 to $4500.Multiply that times four or five MedSpeak stations, and a radiologydepartment can find itself investing over $40,000 to get the benefitsof continuous speech recognition.

IBM claims that MedSpeak can pay for itself in a year by eliminatingtranscription costs. The company cites studies indicating thateach radiologist in a hospital is responsible for $12,000 to $15,000a year in transcription costs -- more than the cost of a MedSpeakstation.

IBM began shipping MedSpeak a week after its Sept. 12 rollout,and will emphasize the product at its RSNA booth.

Just Do It! Another company ready to enter the speech-recognitionarena is Voice Activated Systems Technologies. The Santa Rosa,CA, company will use the RSNA conference as a springboard to introduceits Do It! NoteTaker product for radiologists. The company alreadyhas a speech-recognition product for urology, and is debutingan ophthalmology version next month.

VAST is the product of a merger this May of VAST and a medicalsoftware company run by Fine, a medical imaging veteran who workedwith ultrasound scanner vendors Diasonics and Teknar.

Unlike MedSpeak, Do It! NoteTaker is not a continuous speechproduct. VAST tackles the discrete speech dilemma differently,however, by having radiologists speak in short phrases ratherthan entire sentences. The phrases are then extrapolated by thecomputer into an entire exam report.

"We've developed a system that allows physicians to dictatevery quickly, efficiently, and accurately using the kind of phrasesthey normally use," Fine said.

When you are reading a mammogram, you are either going to geta normal result or a normal variation or certain abnormals thatare pretty well defined. So rather than dictate the whole thingwe can use forms and lists, and we combine these with macros,phrases, and templates.

Do It! NoteTaker has a vocabulary of about 60,000 words, includingspecialized radiology terms, and uses a speech-recognition enginedeveloped by Dragon Systems of Newton, MA. The software is PC-based,with the preferred configuration a Windows 95 Pentium 133-MHzlaptop computer with 32 megabytes of RAM. VAST has set a listprice of $15,000 for a single terminal including hardware, butplans to emphasize leasing, according to Fine. A monthly leasewould probably run about $500 a month, well under the $1500 amonth that VAST estimates a single radiologist spends on transcription.

VAST intends to begin shipping Do It! NoteTaker within a monthafter the RSNA conference. It is also developing a networked versionof the product, running on Windows NT, to be released next year.

A third company developing a new speech-recognition product isArticulate Systems of Woburn, MA. The company is working on aradiology-oriented product called PowerScribe in conjunction withPhilips Dictation Systems, which has developed a speech-recognitionengine that it is licensing to other companies. PowerScribe isa Windows-based system designed to be used in a client-serverarchitecture.

While PowerScribe employs continuous speech, it is not a real-timeproduct: The system uses a batch processing technique, in whichthe computer records a session and then transcribes it later.Like VAST, Articulate Systems plans to debut the system at theupcoming RSNA conference, according to executive vice presidentPeter Durlach.

Taking aim at the leader. IBM, VAST, and Articulate Systems aretaking runs at the dominant share of the radiology market heldby Kurzweil Applied Intelligence of Waltham, MA. Kurzweil haslong been synonymous with speech recognition, debuting the firstversion of its VoiceRad product in 1986.

Unfortunately, Kurzweil has found that being a market pioneercan sometimes be a lonely proposition. Although it has an overwhelmingmarket share in radiology, that market is still very small, withmost of the company's radiology sales going to military and Departmentof Veterans Affairs hospitals. The slow growth of the market isevident from a quick look at Kurzweil's financials: In its mostrecent quarterly results released in August, Kurzweil posted anet loss of $738,000 on revenues of $2.2 million for its secondquarter of fiscal 1997 (end-July). The figures compared with anet loss of $298,000 on revenue of $2.5 million.

Kurzweil has also been forced to cope with the fallout of a financialscandal in which the company's former president and co-CEO, BernardBradstreet, was convicted for his role in a scheme to inflaterevenues by booking millions of dollars in phantom sales for Kurzweil'sproducts. Some industry experts believe the scheme was hatchedin part because of the company's difficulty in securing ordersfrom healthcare clients.

After the departure of Bradstreet and other executives implicatedin the fraud, Kurzweil's board brought in Thomas Brew as CEO toturn the company around. Kurzweil officials say the company hasput the scandal behind it, and point out that the executives involvedwith the scheme left the company two and a half years ago.

Kurzweil is pinning its hopes for a turnaround on Kurzweil ClinicalReporter, a new speech-recognition product that addresses someof the shortcomings that hindered acceptance of VoiceRad. ForClinical Reporter, Kurzweil has developed a new speech-recognitionengine that is more accurate and faster than VoiceRad, accordingto John Bower, director of product management. Its error-correctionand online database features have also been improved, and itsuser interface is now based on Windows 3.1, making it easier tolearn than VoiceRad.

Clinical Reporter is a discrete speech product, but like VAST'sDo It! NoteTaker, it allows radiologists to speak in short phrases.For example, saying the term "normal chest" would generateabout a paragraph of text that the system interprets to mean normalchest. Clinical Reporter can also be used in free-text mode formore complicated cases, and has a free-text accuracy rating ofup to 97%, according to Bower.

Kurzweil began shipping Clinical Reporter in July and intendsto showcase the product at the RSNA meeting. List price of thesystem is about $5000 to $6000 for the software, which requiresa Pentium-class PC with 32 MB of RAM. Kurzweil plans to releasea Windows 95 version later this year and a Windows NT versionin 1997.

Responding to IBM's challenge. IBM's renewed emphasis on theradiology market for speech recognition is a major challenge toKurzweil, as well as smaller firms like VAST and Articulate Systems.Not only does IBM have the corporate resources of one of the world'slargest companies, but it will likely push its claim that MedSpeak'scontinuous speech ability is a marked improvement over discretespeech technology as represented by Kurzweil's Clinical Reporter.

Kurzweil, however, hopes to counter IBM's assault by emphasizingfeatures of Clinical Reporter that go beyond speech recognitionand into information and data management. For example, users canextract data from Clinical Reporter and use them to populate fieldsin databases such as radiology information systems, accordingto Bower. With IBM's MedSpeak, users can transfer data from thesystem into an RIS, but only as an ASCII text file.

"The Kurzweil report can be automatically interfaced toyour database, and the data is online instantly and immediatelyfor analyses," Bower said. "If you have a field fordiagnosis, we can populate it. If you have a field for patientage, we can populate it."

The growing number of companies debuting speech-recognitionproducts is clearly a sign that it is a market with potential.Estimates peg the future size of the healthcare market for speechrecognition at anywhere between $5 billion and $15 billion, accordingto VAST's Fine.

Whether the market will realize that potential is anyone's guess.But Kurzweil, IBM, VAST, and Articulate Systems are betting thatincreased computer power, new, more functional products, and theneed to manage data more efficiently will help persuade radiologiststo give up their Dictaphones. Also, the addition of several newcompetitors to a market formerly dominated by one company willmake speech recognition seem more like an established technologyrather than an untested concept, according to Kurzweil's Bower.

"In general PC voice computing, when IBM started advertising,our sales went up dramatically," Bower said. "We thinkthat in this field it is going to help, too. It is going to raiseawareness of the overall concept, and then we can compete on afeature and product basis."