Search system maps large medical informatics databases

April 25, 2003

PubMed, the bibliographic search service of the National Library of Medicine, provides access to more than 11 million MEDLINE citations dating back 35 years. While the number of citations grows at a steady pace, however, the service's search engines

PubMed, the bibliographic search service of the National Library of Medicine, provides access to more than 11 million MEDLINE citations dating back 35 years.

While the number of citations grows at a steady pace, however, the service's search engines are at best unexciting and at worst impenetrable, according to critics. Information such as citation size, language, and relative value are not addressed.

One answer may be found in a revolutionary Web-based search tool called Visual Net that helps users find information in complex directories and databases. Visual Net may be found useful not only for massive text databases like PubMed, but also in radiology teaching files or data mining of PACS archives.

Medical informatics, with its exceptionally large databases, seems a natural application of Visual Net, whose advanced search and navigation features may ensure that researchers and medical professionals spend as little time as possible searching for information.

The system was designed by Tim Bray, a search engine pioneer and the inventor of the programming language XML. As Bray sees it, Visual Net allows users to explore a database in the same way road maps are explored, using visual clues instead of searching through long lists of text.

Search engines do a good job of finding individual items but are inefficient at giving users a feel for the general shape of what's out there, how much information is available, and what the important features are.

The University of Texas Southwestern Medical Center requested that Bray put a demonstration of the system's medical informatics potential online, and this can be found at http://pubmed.antarcti.ca/start .

Visual Net maps any hierarchical database, such as PubMed, using keyword references to the medical subject headings (MeSH) hierarchy. The online demo visualizes the anatomy/body regions section.

The data are extracted from the database using standard queries. For each PubMed citation, Visual Net captures the article's name, authors, MeSH keywords, date of publication, language, and other metadata. Primary MeSH keywords for a citation are used to place the citation in the MeSH hierarchy.

The initial data map that appears shows the top-level MeSH categories, with colored polygons representing citations filed under each top-level category.

The PubMed map is subdivided into categories. Clicking directly on a category causes a new map to appear, displaying only that category, further subdivided into smaller categories. Click on another category and the process repeats itself, as a map of a country divided into provinces or states would be further subdivided into cities and neighborhoods.

The subcategories within a category are arranged alphabetically in rows from left to right, top to bottom. Category box sizes are determined by the number of citations in that category.

Bray believes this representation helps users to understand the structure and scope of the database.

"In the real world, once you've found a place, you know where it is," Bray said. "It's easy to communicate to other people how to get there. The premise of our company is to take networks and make them more like real places."