CAD comes under scrutiny in breast screening debate

October 29, 2010
Rossano Girometti, MD

,
Lorenzo Cereser, MD

,
Massimo Bazzocchi, MD

,
Chiara Zuiani, MD

Diagnostic Imaging Europe Vol 26 No 7, Volume 26, Issue 7

Computer-aided detection (CAD) tools use software to analyze digital or digitized images to find features associated with the target disease.

Computer-aided detection (CAD) tools use software to analyze digital or digitized images to find features associated with the target disease. The objective of CAD is to mark suspicious findings on the monitor and/or on a print of the image to help radiologists detect lesions (Figure 1).1 The first CAD tools were approved by the U.S. Food and Drug Administration for clinical use in 1998. Several commercial and noncommercial CAD systems have since become available.2

CAD algorithms work by identifying areas of signal in images that may contain a cancer. The standard approach is to use thresholding algorithms to identify as many true signals and as few false signals as possible. Signal data are separated from the background in a process known as segmentation. The signals are then subjected to a probabilistic analysis to assess the likelihood that the structure on the image contains malignancy-induced abnormalities.2 The final result is a CAD prompt if the probability of cancer being present is sufficiently high.

All CAD systems must also be trained on a database of real cases before they are used in clinical practice. These cases should ideally be proven by biopsy or, if not, by follow-up of at least two years. The incidence and type of disease should mirror that expected in the target population.1 A basic method for assessing CAD performance during the training period is to plot the free-response receiver operating characteristic (FROC). This represents the sensitivity of the system in detecting cancer as a function of the averaged number of false-positive marks on each image. The best performing CAD systems will have a low false-positive rate as well as high sensitivity.2

BREAST SCREENING SCENARIO

CAD tools are well-suited to mammographic screening for breast cancer. Screening has been shown to be a cost-effective way of detecting (and thus treating) breast cancer at an early stage. It is, however, associated with a false-negative rate of up to 25%.1 False negatives are cancers that are overlooked at the baseline examination and either appear as interval cancers or are detected on subsequent screening mammograms. Between 27% and 70% of these cancers turn out to be visible on a retrospective analysis of the baseline mammogram.1

The excessive false-negative rate in breast screening has been interpreted as a consequence of the lower prevalence of disease compared with the clinical scenario (fewer and smaller lesions). Thus, small carcinomas detected during screening have less obvious mammographic features than symptomatic lesions. Screening is a highly demanding task involving a detailed visual search for subtle signs.1,3

One solution to the problem of false negatives is to have two radiologists reading mammograms independently, with a consensus review if their opinions on recall differ. This practice of double-reading has been implemented at many institutions in Europe. It can reduce the false-negative rate by 15%, but at a high cost in terms of human and material resources.1

An alternative solution is to replace the second reader with CAD. This is based on the principle that software algorithms provide a constant efficiency over time and will allow lesions overlooked by a single reader to be detected reliably. CAD is also cheaper than hiring a second reader.4 This option is more common in U.S. breast clinics and hospitals than in Europe.

The main measure of a CAD system’s performance is its intrinsic sensitivity in detecting cancer. CAD prompts indicate either true cancers (true positives; TP), or regions that the algorithm mistakenly believes to be abnormal (false positives; FP). CAD must typically have a high intrinsic sensitivity (TP/TP+FN) to increase radiologists’ sensitivity. This will lead to an improved cancer detection rate.

The advantage of high sensitivity is balanced by the false-positive rate. CAD tools typically provide several marks per case, most of which are false-positive calls. The higher the sensitivity of the CAD package, the greater the number of prompts on the screen, and the higher the number of false positives.1 This decrease in specificity means that more patients in the screening program could be recalled for further investigation and more needless breast biopsies could be performed. In other words, the positive predictive value of the screening test is expected to decrease. It has been estimated that for every 1% increase in the recall rate, the rate of cancers detected rises by 0.022%.4

Intrinsic CAD performance is, then, a compromise between sensitivity and specificity. Any adjustments are usually made in favor of raising the sensitivity.

CLINICAL IMPACT

The analysis of digital radiography and CAD in breast screening programs is incomplete. No randomized controlled trials have been performed to assess changes in survival. The performance measures described above are consequently used as surrogate endpoints to evaluate clinical performance.4

The studies published in this area fall into two categories: retrospective and prospective.1 Retrospective studies are designed to evaluate the sensitivity of CAD on previous screening mammograms with a view to checking if the software would have picked up missed cancers. An overview of results (Table 1)5-10 shows that CAD provides adequate sensitivity in detecting cancer, and that this sensitivity is higher for microcalcifications than for masses. These studies also show that CAD can help radiologists by retrieving missed cancers.

Retrospective studies are, however, biased by the higher prevalence of cancer in study populations compared with the screening setting, and by the incorrect assumption that radiologists always accept prompted suggestions. Contrary to this assumption,1 readers tend to ignore the majority of CAD marks; up to 16% of lesions they detect were missed by the software system (Figure 2).11 It is consequently recommended that radiologists use CAD only after the preliminary reading, and that if CAD fails to mark an area the reader thought was suspicious, the patient should still be recalled.11

Prospective clinical trials investigating CAD in a screening scenario require larger populations to be studied. These will, however, lead to a more accurate estimation of CAD’s impact on readers and to better predictions of recall rates. Studies can be matched (radiologists read mammograms before and after CAD) or unmatched (periods of clinical practice without CAD and then with CAD are compared). The results tend to show that CAD leads to an increase in both the cancer detection rate and the recall rate (Table 2).11-20

The clinical value of CAD remains a topic for debate. Some trials and meta-analyses suggest that CAD is not useful at all.19-22 It is difficult to get an overall picture, though, given the wide variation in the design and purpose of different studies. For example, the relative proportions of masses and microcalcifications in the study population may vary, the background density may make it easier or harder to see masses, CAD technology may have evolved from one study to the next, different reading protocols may be employed, etc.1

It should not be forgotten that reader experience can affect CAD performance considerably. CAD is generally assumed to be of more help to inexperienced radiologists. This is because practitioners with less experience making diagnostic decisions will be more likely to be influenced by a positive or negative CAD result.1

Any evaluation of the influence of CAD on readers should consider the complex cognitive features related to human perception. Newer CAD systems may produce fewer prompts, but readers will still be confronted with a range of marks to examine in detail, most being false positives. Radiologists may develop unconscious strategies to reduce the effort of scrutinizing all the CAD prompts. This carries the risk that true positive findings will be missed.1

The exact impact of CAD on readers’ sensitivity has not been determined. One retrospective study of screening mammograms, involving two different CAD systems, showed that CAD improved the detection of cancers that had been overlooked by a single radiologist. This reduction in false negatives led to an increase in sensitivity from 71.2% for the single radiologist to 84.8% and 80.3% for the two CAD systems.23 Another study, focusing on the detection of previously missed cancers, demonstrated a sensitivity of 51.5% for CAD, 62.5% for the radiologist, and 86.2% for the radiologist and CAD combined.24

PRACTICAL EVALUATION

An emerging issue is that many CAD studies assume, incorrectly, that all cancers prompted by CAD would be interpreted correctly in the clinical environment, leading to an overestimation of CAD’s accuracy. Studies of CAD in routine breast assessment, rather than in the screening scenario, have been advocated to provide a more realistic measure of its impact on the accuracy of mammography in clinical practice.25

We evaluated the impact of CAD for full-field digital mammography in a clinical scenario shortly after the introduction of the software at our institution.26 A population of 93 patients (372 digital mammograms) was selected retrospectively. The cases comprised 23 patients with histologically proven malignant lesions (73.9% masses, 26.1% microcalcifications), 31 with benign lesions, and 39 with no lesions. The images were evaluated independently by six blinded radiologists of varying experience, before and after applying the CAD software.

Our results confirmed the general trend discussed above for the screening scenario. Average reader sensitivity for all types of lesions-masses and microcalcifications-varied according to experience. The sensitivity of inexperienced and mildly experienced radiologists to all types of lesions and to masses alone showed a statistically significant increase (p <0.05) when CAD was applied. The use of CAD also increased the false-positive rate slightly (p >0.05).

In conclusion, CAD is still a controversial matter. Most evidence supports its use as a second reader for breast cancer screening. CAD provides adequate sensitivity for the detection of breast cancer, especially microcalcifications, and is particularly helpful when used by less experienced radiologists. The software tools remain less sensitive to masses than to microcalcifications, especially in dense breasts. CAD’s Achilles heel is its related decrease in specificity, and this must be considered when contemplating its use as a screening tool.