Data migration challenges PACS vendors and users

May 8, 2003

A PACS contains different types of data and databases, which have traditionally been stored in different formats. They include image data, demographic data, and DICOM data, as well as functional data such as image enhancement or manipulation performed by

A PACS contains different types of data and databases, which have traditionally been stored in different formats. They include image data, demographic data, and DICOM data, as well as functional data such as image enhancement or manipulation performed by the radiologist. Image data may be treated as objects, while patient data may be stored as text. These variables can become major obstacles when migrating data from one archive or one type of storage media to another.

Vendors often use proprietary formats to store data, even when complying with DICOM requirements. DICOM is intended to ensure interoperability, specifying standard formats for data entering and exiting the archive and workstations. Proprietary data management methods and compression schemes are often employed to improve functionality and speed of operation. The DICOM standard does not address data migration, and compliance with DICOM does not guarantee the ability to migrate data.

Data migration is an important issue because archive technologies and storage media are the fastest changing PACS components, and the new technologies offer efficiencies and capabilities that can reduce PACS costs and improve productivity. These new technologies also offer more scalability, greater compliance with regulations, and increased uptime and reliability. Data migration is essential not only to those who invested in early PACS technologies, but also to those acquiring PACS technologies today.

As users and vendors confront the serious challenge of migrating data and database information from existing archives to current state-of-the-art technology, they must also anticipate the requirement for migration from today's technology to future technologies and storage media. PACS vendors should be required to guarantee the ability to migrate all data and database information to a new archive system so the PACS user can recreate the studies in the original diagnostic form.

Over the last five years, PACS archive architecture has evolved from multitier systems with multiple types of storage media to single-tier architectures with all studies stored on spinning disks. The multitier systems in early PACS use a combination of digital storage arrays, jukeboxes, media, and shelf storage, and they require multiple types of data readers and retrieval methods. The newer single-tier architecture offers many advantages.

Medical images typically must be retained for a seven-year period, although longer retention is required for pediatric and mammography images. In the past, multitiered storage systems were used to address the high cost of storing large data sets and to assist in delivering acceptable functionality. Since online access and faster retrieval speeds correlated with higher costs, studies were prioritized based on the speed/performance required to match clinical need. Multitiered archives used a mix of storage media, keeping current and related studies on the more expensive online RAID storage devices. This enabled access to current and prefetched studies in one to three seconds.

The second level of storage used midpriced, near-line jukeboxes with optical disk, magnetic-optical disk, tape, CD, DVD, advanced intelligent tape, digital linear tape, or 9840 tape media. Access time at this level is one to three minutes. Long-term storage, or what is commonly referred to as the deep archive, stores the remaining studies on lower cost offline tape archives or shelf storage requiring manual retrieval. Retrieval from a deep archive can take 20 to 30 minutes and sometimes hours for a single study. In this multitiered, multimedia storage scenario, a hierarchical storage management (HSM) system is required to direct studies to and from specific storage tiers.

Although these archives and media have a storage life of approximately 30 years, newer media and read drives are cheaper and faster and perform the required functions more efficiently. A PACS implemented today can have single-tier architecture with all studies online all the time. A PACS solution under consideration should provide a method to migrate data from any media type to another in an inexpensive and timely manner.


The archive is the heart and brains of the PACS. It is the intelligence command center that stores image pixel data, patient demographic data, DICOM data, and media management data, and it communicates with all other system components. It must store and manage procedure information and scheduling data from the RIS, match demographic and procedure information from modalities, organize image and text data into patient folders, find and retrieve studies, and route studies to defined locations. In addition, it must perform data management functions to ensure functionality and integrity of data and provide the required backup functions for redundancy and disaster recovery.

The archive is also the most expensive component of a PACS, and its functionality will make or break a PACS investment. Archive characteristics should be the most important criteria for system selection in purchase of a PACS. Archive architecture, type of media used, type of databases employed (relational or object-oriented), functional workflow, migration path, level of security, disaster recovery plan, and compliance with the Health Insurance Portability and Accountability Act are critical factors. All PACS workflow, functionality, and performance are dependent on the capability of the archive. Radiologists often focus on workstation performance when choosing a PACS, but the workstation is only a node off the network, and its functionality depends mainly on the support provided by the archive.

The archive and networks typically fall into the information technology domain. It is recommended that the radiology department work with IT personnel in the selection, implementation, and maintenance of the archive and network. Doing so also presents the opportunity to have a single image archive supporting multiple departments within the healthcare enterprise. In some cases, IT has taken the responsibility for selecting, supporting, and maintaining the image archive.

A central image archive supporting multiple locations and image types has these additional benefits:

?IT has skill sets and resources to support the archive and network.

?Radiology only has to support the PACS application.

?The number of archives and databases in the hospital is reduced.

?Fewer FTEs are required to support the system 24/7.

?It is easier to meet fault tolerance, redundancy, and high availability requirements.

?It is easier to achieve 99.99% uptime for both archive and network.

?It is easier to manage security.

?It is easier to become HIPAA-compliant.

?It provides better disaster recovery and continuation of business capabilities.

The purpose of the medical image archive is to store and retrieve images, query the contents of the archive, and exchange data with other information systems. DICOM supports that purpose by providing standards for storage of DICOM images, for index or directory of images for rapid query, and for enabling storage and retrieval.

DICOM permits interoperability between applications. Specific DICOM information object definitions provide a structure for communication of image data and related information. Image, patient, DICOM, and functional data (window and level, annotations, teaching file tags, and image enhancement or manipulation actions used by the radiologists for diagnosis) are stored in different types of databases (relational and object-oriented). In addition, a multitiered/multimedia archive requires its own database to manage the movement of studies from one tier to another; this is the HSM database.

A multitiered PACS archive thus has three distinct databases: one for demographic data providing information on what studies are available for viewing, a second for the storage of raw image data, and a third for the HSM that tracks and manages studies between storage tiers and devices.

Medical images consist of the DICOM data containing patient demographic information and image data containing pixel information. Patient data are text-based, ranging in the low kilobyte size, and are usually managed by a commercial relational database management system. Medical image data are different because of their size (5 MB to 1 GB per study, and getting larger), and object-oriented databases are used for storage and retrieval. Pixel data are not used in a DICOM query; therefore, it is useful to separate the DICOM header data from the image pixel data and to use different mechanisms for the storage and retrieval of each data type.

Because the DICOM data add overhead to each image every time a command is initiated, vendors use proprietary software for the management of these data. Proprietary software and, in many cases, some rather clever data management methods and compression schemes speed up the system's operation and improve functionality.

Interactive applications such as annotations, window/level settings, and teaching-file tags are also stored differently by various vendors and are sometimes stored separately from the images. Migration of image/pixel data without the interactive applications will enable display of the raw image, but will not allow recreation of the way the study was displayed for diagnosis. Medicolegal problems could result if the hospital cannot recreate the image as it appeared when the radiologist performed the diagnosis.

The following are variations in how PACS databases can be configured:

?DICOM data are stored in relational databases using standard query language (SQL) but can also be stored in an object database.

?Pixel data are stored in object-oriented databases but don't have a standard query model.

?Pixel data can be stored in relational databases, but retrieval is too slow for practical use.

?The HSM query model supports both relational and object databases.

?Relational databases do not represent the relationships among objects as well as object databases. This requires keys.

?Relationships among patients, studies, and images function naturally with object databases.

Despite the variety of methods used to store data and databases and the reliance on proprietary software, a PACS can still be DICOM-compliant. A mix of data types and proprietary software that is internal to PACS operations can still meet DICOM part 10 storage requirements. The important point is to ensure that the vendor provides the ability to migrate all data and database information to a new archive system, with the ability to recreate studies in their original diagnostic form.


Migrating PACS data is not like changing film or printer suppliers. It requires a thorough understanding of how the archive and media function. The ability to migrate patient and image data from multiple databases and archive configuration scenarios to new archive technology presents serious challenges for existing PACS users and vendors. This applies even when migrating data to a new archive from the same PACS vendor.

The image databases must support the ability to adapt over time. Any storage media will be unsupported and obsolete before image retention requirements have been satisfied. A validated data migration procedure is thus a critical aspect of a PACS decision and contract.

The cost of data migration can be minimized and the process greatly simplified if the vendor can meet the following criteria:

- data can be evolved from older media to new media types;

- archive manager software will be able to continue operating when storage technology changes; and

- the three databases are designed to support migration.

Using a single-tier archive with all data stored on spinning disks has become economically feasible and preferable. Having all images online all the time greatly enhances performance, streamlines workflow, and makes data migration less costly and time-consuming and much easier to perform.

Mr. Reed is president of Integration Resources in Lebanon, NJ.