Home-grown data-mining system strikes gold

July 1, 2007

Researchers at Mallinckrodt Institute of Radiology have developed a secure web-based, HIPAA-compliant data-mining tool for radiology reports based on the Google search engine using free and open source technologies.

Researchers at Mallinckrodt Institute of Radiology have developed a secure web-based, HIPAA-compliant data-mining tool for radiology reports based on the Google search engine using free and open source technologies.

Dr. Joseph Erinjeri and colleagues downloaded 20 months of radiology reports (915,000 studies, 2.8 GB) in text format from their RIS to a file server running the Windows 2003 Server operating system. Indexing of the document took approximately 36 hours, averaging 25,000 reports per hour.

The search engine (Google Desktop), web server (Apache), and scripting language (PERL) are all open source and/or freely available.

A keyword search of a common term like patient yielded the first 10 most relevant results of 915,000 total matches in 0.72 seconds. A search of a less common term like moderate cardiomegaly identified 7300 matches in 0.43 seconds.

By using the existing Google search algorithm and framework, radiologists can quickly perform useful searches, the authors said at the American Roentgen Ray Society meeting.