XML standards bring structure to healthcare text documents

December 27, 2005

Extensible markup language related standards could help provide the foundation for better data mining in healthcare, according to researchers at the Justus-Liebig University in Germany.

Extensible markup language related standards could help provide the foundation for better data mining in healthcare, according to researchers at the Justus-Liebig University in Germany.

Most healthcare data are plain text and often not accessible or easy to find at clinical workstations. Researchers at the university propose XML-related standards as a solution (Int J Med Inform 2005;74(2-4):267-277).

"About 80% of healthcare data are narrative text and therefore difficult to interpret by machine," said Ralf Schweiger, Ph.D., of the Institute for Medical Informatics at Justus-Liebig. "We need to produce more structured data."

XML-related standards - XML schema, XForms, extensible stylesheet language, topic maps, and others - provide an infrastructure that might change the situation, he said.

Established database text-matching (search) approaches are often inadequate. They fail to fully exploit given relationships and can produce inaccurate and incomplete search results, according to Schweiger.

"We have therefore developed a different search method called 'topic matching' that relates the search terms meaningfully with each other," he said.

The topic-matching approach requires a flexible model that is able to represent sophisticated relationships between topics, documents, images, services, and other resources.

The topic maps standard turns out to be a good choice for a search model. It allows users to represent typed relationships such as "is synonym of" or "is topically related to," he said. However, structure will not proliferate unless tools are developed to support the population of XML-related standards and the development of XML-aware applications.

"We simply describe the structure of the XML documents by a reference document, document type definition, or XML schema, and the author can immediately start to enter and change XML structured data using a Web browser," Schweiger said.

XForms and extensible stylesheet language standards are used to develop customized user interfaces. The meta standards - resource definition framework and topic maps - allow users to establish intelligent search pathways.

"Our approach of plug-and-play XML has been applied to medical resources like drug information, clinical guidelines, and medical classification systems with promising results," Schweiger said.