Medical de-identification system addresses health records privacy issues

October 17, 2008

The Health Insurance Portability and Accountability Act safeguards patient personal health information, but it also tends to complicate medical research by inhibiting access to medical records necessary to develop public health measures.

The Health Insurance Portability and Accountability Act safeguards patient personal health information, but it also tends to complicate medical research by inhibiting access to medical records necessary to develop public health measures.

A new medical de-identification tool announced in September by researchers at the Regenstrief Institute at the University of Indiana offers a way around this obstacle (J Am Med Inform Assoc 2008;15:601-610).

"Too often, researchers are relegated to using relatively small amounts of patient data in their studies because obtaining additional data is impractical, costly, and time-consuming," said Regenstrief staff scientist Dr. Jeff Friedlin, an assistant professor of family medicine at the UI School of Medicine.

Friedlin's tool, the Medical De-Indentification System, or MeDS, attempts to facilitate acquisition of much larger amounts of data.

MeDS reads medical reports and removes any information that could be used to identify a patient, such as names, medical record identifiers, and social security numbers. Dates are also considered patient identifiers, and MeDS can be programmed to either remove or randomly change each date. All pertinent medical data are retained.

"MeDS segments reports into sections, sentences, and words, then uses a sophisticated pattern-matching technology to detect identifiers," Friedlin said.

MeDS is being used experimentally to de-identify textual messages, including discharge summaries and radiology, laboratory, and pathology reports.

It has not been used to scrub digital images, although the system is designed to be easily modified and adapted to different use cases, Friedlin said.

"If the digital image contains textual information that can be read by a computer, MeDS could be adapted to de-identify it," he said.

This is not the first software program designed to automatically remove patient identifiers from medical records, but Friedlin said MeDS is both broader and more accurate.

"MeDS is the first system described in the literature that attempts to detect and eliminate misspelled names," he said.

For example, MeDS is able to find and delete misspellings like "SSmith," "Smithh," "Smmith," or even "mith."

The system is still in the development stage but has been tested on patient data housed in the Regenstrief archive containing 35 years worth of records.

"While it appears to be very accurate in de-identifying medical documents, we are performing additional studies involving greater numbers and varieties of medical reports to ensure it is generalizable across the entire medical domain," Friedlin said.

The institute is currently formulating plans to make MeDS available, or to possibly provide a de-identification service, Friedlin said.