On the edge of a digital black hole

November 14, 2008

It seems like we have been waiting forever for PACS to arrive. The first of these systems appeared three decades ago, an anomaly on the RSNA exhibit floor. They were the harbingers of a technology whose use, unlike that of CT or MR scanners, produced no revenue. Their adoption required so much: an embrace of efficiency, the digitalization of radiography, and a willingness to soft-read every imaging study.

It seems like we have been waiting forever for PACS to arrive. The first of these systems appeared three decades ago, an anomaly on the RSNA exhibit floor. They were the harbingers of a technology whose use, unlike that of CT or MR scanners, produced no revenue. Their adoption required so much: an embrace of efficiency, the digitalization of radiography, and a willingness to soft-read every imaging study.

It would be nice to think this digital evolution was driven by medical need or insight, but really it was the inevitable result of a broad-based, consumer-driven migration to digital everything -- photos, videos, music, mail, even tax filings.

At last count, the world had accumulated 369 exabytes of data. Put in numbers more easily appreciated: 369 quintillion bytes. That's 369 followed by 18 zeroes.

I'd be willing to bet a quintillion or two of these data are already so far out of date they can no longer be read: data archived in WordPerfect on eight-inch floppy disks, for example. And it's not just personal computer-based obsolescence.

Some of the data collected from NASA's 1976 Viking landing on Mars is unreadable and apparently lost forever. Closer to home, the digitally archived data from the 1960 U.S. Census are either gone or close to it. Only two machines, one in Japan and the other in the Smithsonian Institution, could read the data, according to the National Archive's website, and that was in the mid-1970s.

Today, we are sending all sorts of data, including medical reports, by e-mail, a medium that has become the communications backbone of business and government around the world.

"If that information is lost, you've lost the archive of what has actually happened in the modern world," said Jerome P. McDonough, an assistant professor in the graduate school of library and information science at the University of Illinois at Urbana-Champaign. "We've seen a couple of examples of this so far."

McDonough cites a missing White House e-mail archive from the runup to the Iraq War, a violation of the Presidential Records Act.

"With the current state of the technology, data are vulnerable to both accidental and deliberate erasure," he said. "What we would like to see is an environment in which we can make sure that data do not die due to accidents, malicious intent, or even benign neglect."

There will always be some risk, regardless of whether data are analog or digital. President Richard Nixon's personal secretary Rose Mary Woods erased 18.5 minutes of a 1970s audiotape, with consequences that will be noted for generations to come.

With so much medical data now at stake, however, the risk of digital loss due to obsolete file formats or degrading media is not acceptable. The rapid march of technology can overwhelm archival media in a decade or less, endangering medical data long before the patient is done using them.

Avoiding a black hole of data inaccessibility will require vigilance and forward planning. We'll need to migrate data to new formats and develop methods for getting old software to work on new platforms, McDonough said. The best approach may be to create and rely on open-source file formats and software.

This would require unprecedented collaboration. Proprietary software is the lifeblood of U.S. business. It is what gives companies an edge on their competitors. But for medical imaging data to survive will require finding a balance between this fundamental capitalist demand and the need for future access to data.