Home-grown data-mining system strikes gold

Article

Researchers at Mallinckrodt Institute of Radiology have developed a secure web-based, HIPAA-compliant data-mining tool for radiology reports based on the Google search engine using free and open source technologies.

Researchers at Mallinckrodt Institute of Radiology have developed a secure web-based, HIPAA-compliant data-mining tool for radiology reports based on the Google search engine using free and open source technologies.

Dr. Joseph Erinjeri and colleagues downloaded 20 months of radiology reports (915,000 studies, 2.8 GB) in text format from their RIS to a file server running the Windows 2003 Server operating system. Indexing of the document took approximately 36 hours, averaging 25,000 reports per hour.

The search engine (Google Desktop), web server (Apache), and scripting language (PERL) are all open source and/or freely available.

A keyword search of a common term like patient yielded the first 10 most relevant results of 915,000 total matches in 0.72 seconds. A search of a less common term like moderate cardiomegaly identified 7300 matches in 0.43 seconds.

By using the existing Google search algorithm and framework, radiologists can quickly perform useful searches, the authors said at the American Roentgen Ray Society meeting.

Newsletter

Stay at the forefront of radiology with the Diagnostic Imaging newsletter, delivering the latest news, clinical insights, and imaging advancements for today’s radiologists.

Recent Videos
CT-Based Deep Learning Model May Reduce False Positives with Indeterminate Lung Nodules by Nearly 40 Percent
Leading Breast Radiologists Discuss Rise of Breast Cancer Incidence in Women Under 40
New Research Examines Radiation Risks with CT Exposure Prior to Pregnancy
© 2025 MJH Life Sciences

All rights reserved.