Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Overestimation of test performance by ROC analysis: Effect of small sample size

Conference · · J. Nucl. Med.; (United States)
OSTI ID:7090362

New imaging systems are often observer-rated by ROC techniques. For practical reasons the number of different images, or sample size (SS), is kept small. Any systematic bias due to small SS would bias system evaluation. The authors set about to determine whether the area under the ROC curve (AUC) would be systematically biased by small SS. Monte Carlo techniques were used to simulate observer performance in distinguishing signal (SN) from noise (N) on a 6-point scale; P(SN) = P(N) = .5. Four sample sizes (15, 25, 50 and 100 each of SN and N), three ROC slopes (0.8, 1.0 and 1.25), and three intercepts (0.8, 1.0 and 1.25) were considered. In each of the 36 combinations of SS, slope and intercept, 2000 runs were simulated. Results showed a systematic bias: the observed AUC exceeded the expected AUC in every one of the 36 combinations for all sample sizes, with the smallest sample sizes having the largest bias. This suggests that evaluations of imaging systems using ROC curves based on small sample size systematically overestimate system performance. The effect is consistent but subtle (maximum 10% of AUC standard deviation), and is probably masked by the s.d. in most practical settings. Although there is a statistically significant effect (F = 33.34, P<0.0001) due to sample size, none was found for either the ROC curve slope or intercept. Overestimation of test performance by small SS seems to be an inherent characteristic of the ROC technique that has not previously been described.

Research Organization:
Univ. of Arizona, Tucson, AZ
OSTI ID:
7090362
Report Number(s):
CONF-840619-
Journal Information:
J. Nucl. Med.; (United States), Journal Name: J. Nucl. Med.; (United States) Vol. 25:5; ISSN JNMEA
Country of Publication:
United States
Language:
English