Hybrid method conquers negation detection challenge

July 9, 2007

Negative findings in radiology reports and other clinical documents contain information important to clinical care, medical research, and education.

Negative findings in radiology reports and other clinical documents contain information important to clinical care, medical research, and education.

Negative findings need to be automatically detected, however, to be correctly indexed and used by computers in decision support and other applications. Negation detection has become a universal challenge in digital information retrieval.

A new paper presents a novel automated approach to detect negative findings in free text (J Am Med Inform Assoc 2007;14:304-311).

"The method combines previously designed negation detection systems that used lexical regular expression approaches and syntactical, or grammatical, parsing," said Yang Huang, Ph.D., of the Center for Clinical Informatics at Stanford University.

The more common regular expression matching treats text as a string and matches it against predefined 1D string patterns. The problem with this method is that determining the scope of a negation can be difficult because the lexical approach does not use the syntactical structural information of a sentence.

Syntactical information is embedded in sentence trees generated through parsing that directly reflect an understanding of the sentence and are, therefore, more informative in detecting negations, Huang said.

"However, since it is very difficult to generate parse trees accurately, the syntactical approach may very likely be less accurate," he said.

Huang's remedy is a hybrid approach that combines the advantages of the two earlier approaches. A classification of negations was first developed according to the syntactical categories of negation signals and the phrase patterns to locate negated phrases.

For example, in the phrase "There is no evidence of cervical lymph node enlargement," "no" is the negation signal used to denote that a following concept is negated, and "cervical lymph node enlargement" is the negated phrase.

"A classifier first detects possible negation in a sentence and classifies the negation into one of 11 categories by regular expression matching," he said. "The computer then extracts the negated phrases from the parse tree, according to grammar rules developed for that negation type. Regular expression matching is fast and sensitive in identifying the type of negations, while the grammatical approach helps locate negated phrases accurately within or outside the proximity of the negation signal."

Huang's hybrid approach identifies negated phrases in radiology reports with a sensitivity of 92.6%, positive predictive value of 98.6%, and specificity of 99.87%. It could help indexing systems and software applications detect negative findings more accurately and reduce most false positives.

"Modeling and integration of radiology reports and other narrative clinical documents are of critical importance for projects that support biomedical research and education successfully," Huang said.

With constant improvements in technology and research, intelligent computer applications will be able to help clinicians to improve the safety, quality, and efficiency of patient care, he said.