TalkStation makes 'smart' guesses when radiologists mumble words

February 6, 2002

Enhancements boost speech recognition accuracyThe biggest problem with speech recognition programs is that people speak imperfectly. They muffle words, or their voices trail off at the ends of sentences. They turn away from the

Enhancements boost speech recognition accuracy

The biggest problem with speech recognition programs is that people speak imperfectly. They muffle words, or their voices trail off at the ends of sentences. They turn away from the microphone when dictating or run words together. Transcriptionists make up for those problems by knowing the person and the context of the discussion. Speech recognition programs have long tried to take the measure of speakers using algorithms that analyze voice patterns. Talk Technology has now added the second piece, context, to its TalkStation product.

"We have put some smarts into our dictionary," said Milan di Pierro, director of product management at Talk Technology.

Talk Technology has integrated a dictionary customized to radiology with interpretive algorithms that examine the syntax of the spoken words. Statistical analyses connect words that commonly occur in a phrase or sentence. If the first word comes through clearly but the second is garbled, for example, the computer recognizes the first and then matches it to the likely second word. The result is fast and accurate transcription, according to di Pierro.

"We use this language model to mitigate the amount of time the user must spend training the program," he said.

This type of user-specific training has long been a core requirement of speech recognition programs. Users dictate text during an "enrollment" period to familiarize the program with individual nuances in word pronunciation, as well as other environmental factors, such as background noise. TalkStation and its radiology-specific language model take only about five minutes to enroll a user, di Pierro said.

The language model is part of version 2.2, released at the 2001 RSNA meeting. Several other new features were designed to widen the product's appeal.

To improve reliability, V2.2 automatically switches to an independent mode if the server fails, commanding the computing platform running the software to continue transcribing the dictation on its own. The software then resynchronizes with the shared database when the server begins functioning again.

Users of the newest version of TalkStation can boost productivity by using the Internet to access automatic ICD-9 coding, whereby a language-processing algorithm reads the report and assigns a diagnostic code to assist in billing. These algorithms are online at either of two companies working with Talk Technology. Developers have also expanded beyond the English language to include German, French, and Finnish language models.

The growing international flavor of TalkStation reflects an aggressive marketing strategy that has led this product into the camps of major PACS developers. GE, Philips, Siemens, IDX, Agfa, Sunquest, Kodak, Canon, and Algotec are among the companies integrating TalkStation into their own product lineups. Talk Technology provides a software developers' kit and staff to smooth the integration process, which sometimes can be completed in a matter of several weeks.

Company executives are ambiguous about the price of its product to end users. The actual outlay depends on the number of stations installed, the training and professional services the customer wants, and the level of integration with existing IT systems that needs to occur. The one constant, di Pierro said, is the rate of payback.

"Typically the cost (to end users) is about the same as one year of transcription services," he said. "Customers get a complete return on investment in 16 months."

To assist in making its pitch to prospective customers, the company has developed a cost-benefit analysis tool. This tool records the quantity of transcription and number of reports being generated at the facility, considers the cost of these services, predicts the time needed to achieve full use of a TalkStation system, then calculates the rate of return and net value of installing the system.

Talk Technology has held down the cost of TalkStation by using off-the-shelf technology whenever possible. Its software runs on a Pentium-based platform, itself a testament to the advances made in desktop computing over the past several years. (The company recommends Pentium III or better with 256 MB of RAM.) Customers can choose from an array of microphones provided by different manufacturers.

While continuing to keep costs down, V2.2 goes a long way toward meeting the major challenge for developers of speech recognition programs: overcoming human foibles. But some difficulties remain. One of the biggest is changing the negative perception of speech recognition held by many prospective buyers. This has come from consumer PC products whose developers promised more than they could deliver.

"It does make it a little bit of a harder sell initially, but the nice thing is we can exceed this (lowered) expectation many times over," di Pierro said.