New strategy allows computerized search of digital image databases

July 30, 2001

University researchers have come up with a new way to automatically sort, classify, and retrieve digital images. Their method promises faster, more accurate database searches for applications such as radiologic data mining. The new approach, called

University researchers have come up with a new way to automatically sort, classify, and retrieve digital images. Their method promises faster, more accurate database searches for applications such as radiologic data mining.

The new approach, called SIMPLIcity, looks at images in a manner similar to the way people look at images. Just as a person shown a picture of a horse can extract features characteristic of horses and then identify other pictures that contain horses, this new computer-based system approaches images, said its developer, James Z. Wang, Ph.D., an assistant professor of information sciences and technology at Pennsylvania State University who specializes in medical informatics.

Using wavelets and a novel technology called integrated region matching (IRM) that performs region-based image similarity comparison, the system retrieves relevant images from any image database or from the Web on the basis of automatically derived image features or content.

Images are represented by a set of regions, roughly corresponding to objects, characterized by features reflecting color, texture, shape, and location properties, according to Wang. IRM evaluates overall similarities between images, incorporating properties of all the regions in the images using the region-matching scheme.

"The system processes each image in the database, extracts key features, indexes the features in the feature space, and retrieves images based on the feature comparison scheme - all of which enables users to search for visually related images from massive image databases," Wang said.

SIMPLIcity has been validated on a database of about 200,000 general-purpose images and an archive of more than 70,000 digital pathology images.

"The system is targeted to retrieve electronic medical images, although it also works well for general-purpose photographs," he said. "Medical image retrieval is a lot more demanding in terms of accuracy."

Other potential applications include education, biomedicine, crime prevention, the military, commerce, entertainment, and Web image classification, Wang said.

Image retrieval techniques in commercial use rely mostly on key words or descriptions. While these text-based approaches are accurate and efficient for limited databases, it becomes prohibitively expensive to manually input descriptions on larger scales such as image databases containing an astronomical number of observations, or radiographs. The new approach eliminates the need to input textual information.

To view a demonstration of SIMPLIcity, go to http://wang.ist.psu.edu/IMAGE .