|Articles|October 5, 2007

Informatics association challenge elicits de-identification solutions

Medical records are an important source for clinical researchers, but text records used outside the hospital must first be de-identified.

Several automatic schemes to achieve de-identification recently surfaced as the result of a challenge issued jointly by the American Medical Informatics Association and i2b2 (informatics for integrating biology and the bedside), an NIH-funded National Center for Biomedical Computing project based at Partners HealthCare System. An overview of participating systems, details of evaluation metrics, and findings can be found in the Association's journal (J Am Med Inform Assoc 2007;14:550-563).

In one effort, researchers at the University of Szeged in Hungary developed a de-identification model that successfully removes personal health information from hospital records, in conformance with the Health Insurance Portability and Accountability Act.

The system is a machine learning-based iterative that uses a named entity recognition approach on semistructured documents. Named entity recognition (NER) is a subtask of information extraction that locates and classifies elements in text into predefined categories.

"Our named entity approach is based on a complex feature set and boosted decision trees, and it uses a different feature representation from other state-of-the-art NER systems," said György Szarvas, Ph.D., of the university's informatics department.

Szarvas's method identifies personal health information in several steps (J Am Med Inform Assoc 2007;14:574-580). First, it labels all entities whose tags can be inferred from the structure of the text, and it then utilizes this information to find further personal health information phrases in the flow text parts of the document.

Szarvas said that customizing his system took only a few weeks.

"Such systems can be built quite rapidly for any institute for de-identification or other NER-like tasks," he said.

Elsewhere, researchers at MITRE (Bedford, MA), along with Harvard, Brandeis, and Stanford universities, took a different approach, focusing instead on rapid adaptation of existing toolkits for named entity recognition. They used two existing tools: Carafe and LingPipe.

The researchers report that the out-of-the-box Carafe system achieved a good score (phrase F-measure of 0.9664) with only four hours of work to adapt it to the de-identification task (J Am Med Inform Assoc 2007;14:564-573). With further tuning, they were able to reduce the token-level error term by over 36% through task-specific feature engineering and the introduction of a lexicon, achieving a phrase F-measure of 0.9736.

Challenge organizer Özlem Uzuner, Ph.D., of the University at Albany, SUNY, said the efforts show that most private health information can be de-identified with more than 98% accuracy.

Whether 98% accuracy is good enough is an open question best left to policymakers, she said.

"The results are nevertheless encouraging from a technical perspective and show that much can be accomplished to de-identify data with the best techniques," Uzuner said.

AMIA was so encouraged it is now in the process of organizing similar challenges on other open research questions in medical language processing.

Stay at the forefront of radiology with the Diagnostic Imaging newsletter, delivering the latest news, clinical insights, and imaging advancements for today’s radiologists.

Subscribe Now!

Latest CME

In-Person + Virtual Event

Live Tumor Board: Squamous Cell Carcinoma of the Head & Neck – Post-CRT Decisions in the Locally Advanced Setting

February 19, 2026

In-Person Event

43rd Annual Miami Breast Cancer Conference®

March 5-8, 2026

Video

Inaugural Brain & Spine Metastases Conference: Evolving Practice and Emerging Therapies

Manmeet Ahluwalia, MD, MBA

In-Person Event

19th Annual New York GU Cancers Congress™

March 13-14, 2026

Video

Mastering Advances in Managing Unresectable and Metastatic NSCLC—Immunotherapy, Targeted Therapies, and Emerging Strategies

Marina Chiara Garassino, MD; Sarah Goldberg, MD, MPH; Biagio Ricciuti, MD, PhD

Video

Cases & Conversations™: Expert Perspectives on Leveraging Recent Advances to Transform SCLC Treatment

Jacob Sands, MD; Anne Chiang, MD, PhD; Alissa J. Cooper, MD

Multimedia

Community Practice Connections™: Empowering Interventional Radiologists in the Emerging Era of Oncolytic Immunotherapies for Melanoma

Yana G. Najjar, MD; Douglas B. Johnson, MD, MSCI; Rahul A. Sheth, MD, FSIR

Video

(CME Credit) Advancing Outcomes in Limited-Stage Small Cell Lung Cancer: From Evidence to Practice

Lauren Averett Byers, MD; Percy Lee, MD, FASTRO; Erminia Massarelli, MD, PhD, MS

Video

PER Tumor Board®: Applying Recent Advances to Transform the Treatment Paradigm in SCLC—Expert Perspectives on New Approvals and Emerging Strategies

Jonathan W. Goldman, MD; Percy Lee, MD, FASTRO; Erminia Massarelli, MD, PhD, MS; Misty D. Shields, MD, PhD

Informatics association challenge elicits de-identification solutions

Newsletter

Latest CME

Live Tumor Board: Squamous Cell Carcinoma of the Head & Neck – Post-CRT Decisions in the Locally Advanced Setting

43rd Annual Miami Breast Cancer Conference®

Inaugural Brain & Spine Metastases Conference: Evolving Practice and Emerging Therapies

19th Annual New York GU Cancers Congress™

Mastering Advances in Managing Unresectable and Metastatic NSCLC—Immunotherapy, Targeted Therapies, and Emerging Strategies

Cases & Conversations™: Expert Perspectives on Leveraging Recent Advances to Transform SCLC Treatment

Community Practice Connections™: Empowering Interventional Radiologists in the Emerging Era of Oncolytic Immunotherapies for Melanoma

(CME Credit) Advancing Outcomes in Limited-Stage Small Cell Lung Cancer: From Evidence to Practice

PER Tumor Board®: Applying Recent Advances to Transform the Treatment Paradigm in SCLC—Expert Perspectives on New Approvals and Emerging Strategies

Trending on Diagnostic Imaging

Radiology Roundup of New FDA Clearances — February 1 — February 7

Leading Breast Radiologists Discuss the Recent Lancet Study on AI and Interval Breast Cancer

Is AI Better Than Neuroradiologists at Evaluating Aneurysm Growth on CTA and MRA Scans?

Diagnostic Imaging's Weekly Scan: February 1 — February 7

FDA Clears AI-Powered Triage Platform for Digital Breast Tomosynthesis