Radiology tackles broad range of applications with data mining

May 26, 2005

Data mining has the potential to revolutionize clinical information management. But while data mining is a hot topic in information technology circles, it has yet to make many forays into radiology. Only a few departments have taken the plunge.

Data mining has the potential to revolutionize clinical information management. But while data mining is a hot topic in information technology circles, it has yet to make many forays into radiology. Only a few departments have taken the plunge.

Data mining at its most fundamental is the monitoring of trends and the extraction of those trends from a vast repository of common data. It is a way to discover new meaning from existing data. Beyond that definition, data mining can encompass a profusion of applications.

Hospitals are branching out in the ways they use data repositories to improve their business operations and to help with clinical research, according to the second survey of data warehouse use in healthcare organizations performed by the Healthcare Information and Management Systems Society Data Warehouse and Data Mining Special Interest Group. The group reports that data warehouses are no longer the exclusive province of either the clinical or financial components of a healthcare organization. They are now more generally used in all areas of the hospital, said Larry Wolf, senior consulting architect at Kindred Healthcare in Louisville, KY.

Wolf presented the 2004 survey results during an electronic session at the February 2005 HIMSS meeting in Dallas.

In addition to financial analysis, budgeting, and clinical research-the areas that had been addressed in the group's first study, released in 2001-respondents reported using data warehouse technology to address several new areas, including managed-care analysis, clinical benchmarking, performance management, clinical effectiveness, and labor/productivity analysis.

Radiology departments are turning to data mining technologies to help them with everything from searching for specific teaching cases to determining exam ordering appropriateness.


Dr. Daniel L. Rubin, an informatics research scientist, and colleagues at Stanford University School of Medicine developed Radbank, a data warehouse using open source tools that has amassed more than 1.8 million radiological reports and 270,000 pathology reports. Physicians have mined the data in Radbank to identify patient cohorts for research studies, teaching cases, and summary statistics such as utilization rates of particular radiological procedures.

"Implementing Radbank has allowed us to access the vast clinical repository that has previously been trapped in disparate proprietary commercial medical information systems," Rubin said.

Radbank's data warehouse is implemented as a relational database. The system's developers identified and separated the core functions of Radbank into individual components to reduce the complexity of code management and evolution. Some of the components include separate modules to import historical data from each of the different source departmental databases, a module to capture and parse concurrent clinical data in HL7, a module to convert text reports to XML, and backup/recovery modules.

A review of Radbank's effectiveness revealed that physicians located 13 times as many teaching cases than were found manually, Rubin said at the 2004 RSNA meeting. In one instance, a radiologist searching for pathologically confirmed viral esophagitis found no cases, while Radbank discovered 55.

Additional work is under way to make Radbank more user-friendly by moving toward a Web-based interface and away from an SQL-query format, Rubin said. The staff required to develop and deploy Radbank was a small one: one programmer and one physician. The system took about one year to develop and cost less than $2000 to build.


A system called LEXIMER, for lexicon-mediated entropy reduction, developed at Massachusetts General Hospital, can isolate reports with positive findings and recommendations for additional action from those with no findings or recommendations. It can also identify patterns associated with those findings and recommendations.

"LEXIMER brings structure to unstructured radiology reports so we can perform highly sophisticated data mining," said Dr. Keith Dreyer, vice chair of radiology computing and information sciences at MGH.

Creating LEXIMER wasn't easy. It took about 18 months to design and train the system's natural language engine, according to Dreyer, the chief developer. After the system was created, it required seven people to validate it and four more to implement it. Once validated, the system was fully integrated with the departmental RIS and operational for data mining in about three months.

A review of LEXIMER by Dr. Mannudeep K. Kalra, a research fellow in radiology, and colleagues at MGH found that in a seven-year period CT chest scans generated recommendations at rates ranging from 5% to 31%. In the same period, recommendation rates for MR were 5% to 13%, for mammography 5% to 12%, for nuclear medicine 4% to 5%, and for PET 3% to 28% (Table 1).

More than three million unstructured radiology reports were included in the database. A more detailed analysis of patterns over those seven years revealed that recommendation rates for all CT scans grew from 11% in 1995 to 20% in 2002, Kalra said at the 2004 RSNA meeting.

Data mining techniques developed at MGH can also be used to analyze physician ordering patterns and can deliver vital pieces of information:

the number of specific examinations ordered by a physician, the number and type of indications given by the ordering physician, the positive exam outcome for each indication, and the average exam appropriateness rating by physician.

Dreyer gave an example using brain CT ordered by two different theoretical physicians: Dr. A ordered 1200 brain CT scans over a 12-month period. Of those exams, Dr. A reported visual disturbances as the exam indication for 60% of the cases and tinnitus for 40% of the cases. Dr. B, on the other hand, ordered 2000 brain CTs during the same time period, citing headache as the indication for 70% of the orders and dizziness for the other 30%.

Having these data available and matching them to a modified ACR appropriateness criterion, Dr. A, in this case, might show an overall appropriateness criteria of 8.7 for CT of the brain compared with 4.2 for Dr. B, Dreyer said. Further analysis could reveal a 75% positive exam outcome for brain CT for Dr. A compared with 35% for Dr. B.


Some radiology departments that have not yet developed specific data mining technology investigate data by manually extracting it from a number of systems. At Northwestern University, physicians aggregate the extracted data and analyze them on their own, according to Dr. David Channin, chief of imaging informatics.

The radiology department uses the extracted information in a number of ways, including tracking down all studies to ensure that they were promptly performed and interpreted.

"The primary goal of the data mining here is continuous quality improvement-knowing exactly what is being done, when, and where, and then continuously examining how we do things in order to improve," he said.

Implementing infrastructure to mine electronic data can be difficult because many information systems have proprietary database schemas with poor mechanisms for extracting reports. Channin found this to be a problem even with the large clinical information systems at his facility, where only a fraction of the desired information is available. A common theme among radiology departments attempting to tackle data mining is that homegrown systems must be developed.

One possible answer to the problem, Channin said, is to monitor the HL7 traffic between systems. The messages flowing between systems contain a large amount of information that can be used to perform data mining for clinical, research, and administrative purposes. Additionally, the Integrating the Healthcare Enterprise initiative specifies an audit trail and node authentication integration profile that defines how different information systems can all log messages to a central server.


The radiology department at the University of California, Los Angeles, like Northwestern, uses data mining for a variety of applications, including examining workflow and productivity, according to Dr. Osman Ratib, vice chair of information systems in the radiology department.

"The data that we have from the RIS can be mined to track workloads, efficiency, and turnaround time," he said.

While the answers to all of these operational queries reside in the RIS, vendors currently provide very little to extract those answers, Ratib said. The department developed its own software layers to place over the RIS for data mining purposes.

Using the software, managers can better monitor daily and even hourly workload changes on particular scanners. They can then make appropriate plans for renewal and extension of scanners in areas where they reach capacity. Workload trends extracted through data mining can also be used to plan for extended hours in certain areas to provide better coverage and to accommodate more studies per day.

"Data mining allows us to make capital investment decisions in areas that show rapid growth, like the unpredicted growth in CT due to multislice CT capabilities, which has led to investments in more high-end CTs, and the similar growth in hybrid PET/CT scanners with the rapid increase in demand from oncology for these types of studies," he said.

Aside from tracking financial and workflow information, the department is also involved in InfoShare, a hospital-wide project to share and produce mineable data. InfoShare was developed using funding received from the National Library of Medicine's Integrated Advanced Information Management Systems program. The InfoShare program, now in its operations phase, is based on policies at UCLA to share technology investments in information systems from one area of patient care, education, or research to enhance activities in other areas, according to the project's Web site (

InfoShare aims to tailor presentation of the medical record to the specific context in which the data are being requested, such as clinical, research, or educational requirements. It will also link investment in an institutional review board electronic administration system to the electronic medical record for authentication.


With so few data mining tools specific to radiology applications commercially available, most departments have been required to develop their own systems. These homegrown solutions vary widely depending on the specific information they are tracking. Many respondents to the HIMSS data warehouse survey said they have more than one type of data warehouse architecture in use at their facility (Table 2), according to Wolf.

Most data warehouse theories argue for a tightly integrated data-sharing model, in contradiction to the variety of architectures actually used, often at the same institution, Wolf said.

The question remains whether organizations will reduce architectural complexity to meet the integrated warehouse gold standard, or the architectures will stay fragmented to cope with the increasing complexity and wide array of tasks being addressed, he said.

As each department within a hospital facility develops its own data mining infrastructure, the task of using information from one part of the hospital to improve research or operations in another becomes more difficult.

Another challenge to data mining includes the wide variety of medical terminologies in use in many hospitals, Rubin said. A disease like diabetes may be recorded in the medical record using the term "diabetes mellitus" or "diabetes" or "DM" or an ICD-9 code.

Physicians need to develop effective strategies to map existing medical data to controlled terminologies or to collect medical information directly using these terminologies to make data mining activities more effective, Rubin said.

"The technology exists for us to start tapping into the rich electronic data resources available in hospital information systems. The challenge is that data warehouse technology thus far has only been sparsely implemented in hospitals, likely because of cost and technical issues," he said. "A goal of our work is to delineate methods to make such implementation tractable and to encourage the healthcare community to pursue similar projects."