skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Analysis and minimization of overtraining effect in rule-based classifiers for computer-aided diagnosis

Abstract

Computer-aided diagnostic (CAD) schemes have been developed to assist radiologists detect various lesions in medical images. In CAD schemes, classifiers play a key role in achieving a high lesion detection rate and a low false-positive rate. Although many popular classifiers such as linear discriminant analysis and artificial neural networks have been employed in CAD schemes for reduction of false positives, a rule-based classifier has probably been the simplest and most frequently used one since the early days of development of various CAD schemes. However, with existing rule-based classifiers, there are major disadvantages that significantly reduce their practicality and credibility. The disadvantages include manual design, poor reproducibility, poor evaluation methods such as resubstitution, and a large overtraining effect. An automated rule-based classifier with a minimized overtraining effect can overcome or significantly reduce the extent of the above-mentioned disadvantages. In this study, we developed an 'optimal' method for the selection of cutoff thresholds and a fully automated rule-based classifier. Experimental results performed with Monte Carlo simulation and a real lung nodule CT data set demonstrated that the automated threshold selection method can completely eliminate overtraining effect in the procedure of cutoff threshold selection, and thus can minimize overall overtraining effect in themore » constructed rule-based classifier. We believe that this threshold selection method is very useful in the construction of automated rule-based classifiers with minimized overtraining effect.« less

Authors:
;  [1]
  1. Department of Radiology, University of Chicago, 5841 S. Maryland Avenue, Chicago, Illinois 60637 (United States)
Publication Date:
OSTI Identifier:
20775055
Resource Type:
Journal Article
Resource Relation:
Journal Name: Medical Physics; Journal Volume: 33; Journal Issue: 2; Other Information: DOI: 10.1118/1.1999126; (c) 2006 American Association of Physicists in Medicine; Country of input: International Atomic Energy Agency (IAEA)
Country of Publication:
United States
Language:
English
Subject:
62 RADIOLOGY AND NUCLEAR MEDICINE; COMPUTERIZED SIMULATION; COMPUTERIZED TOMOGRAPHY; DIAGNOSIS; IMAGE PROCESSING; IMAGES; LUNGS; MONTE CARLO METHOD; NEOPLASMS; NEURAL NETWORKS

Citation Formats

Li Qiang, and Doi Kunio. Analysis and minimization of overtraining effect in rule-based classifiers for computer-aided diagnosis. United States: N. p., 2006. Web. doi:10.1118/1.1999126.
Li Qiang, & Doi Kunio. Analysis and minimization of overtraining effect in rule-based classifiers for computer-aided diagnosis. United States. doi:10.1118/1.1999126.
Li Qiang, and Doi Kunio. Wed . "Analysis and minimization of overtraining effect in rule-based classifiers for computer-aided diagnosis". United States. doi:10.1118/1.1999126.
@article{osti_20775055,
title = {Analysis and minimization of overtraining effect in rule-based classifiers for computer-aided diagnosis},
author = {Li Qiang and Doi Kunio},
abstractNote = {Computer-aided diagnostic (CAD) schemes have been developed to assist radiologists detect various lesions in medical images. In CAD schemes, classifiers play a key role in achieving a high lesion detection rate and a low false-positive rate. Although many popular classifiers such as linear discriminant analysis and artificial neural networks have been employed in CAD schemes for reduction of false positives, a rule-based classifier has probably been the simplest and most frequently used one since the early days of development of various CAD schemes. However, with existing rule-based classifiers, there are major disadvantages that significantly reduce their practicality and credibility. The disadvantages include manual design, poor reproducibility, poor evaluation methods such as resubstitution, and a large overtraining effect. An automated rule-based classifier with a minimized overtraining effect can overcome or significantly reduce the extent of the above-mentioned disadvantages. In this study, we developed an 'optimal' method for the selection of cutoff thresholds and a fully automated rule-based classifier. Experimental results performed with Monte Carlo simulation and a real lung nodule CT data set demonstrated that the automated threshold selection method can completely eliminate overtraining effect in the procedure of cutoff threshold selection, and thus can minimize overall overtraining effect in the constructed rule-based classifier. We believe that this threshold selection method is very useful in the construction of automated rule-based classifiers with minimized overtraining effect.},
doi = {10.1118/1.1999126},
journal = {Medical Physics},
number = 2,
volume = 33,
place = {United States},
year = {Wed Feb 15 00:00:00 EST 2006},
month = {Wed Feb 15 00:00:00 EST 2006}
}
  • Purpose: To develop a semiautomated computer-aided diagnosis (CAD) system for thyroid cancer using two-dimensional ultrasound images that can be used to yield a second opinion in the clinic to differentiate malignant and benign lesions. Methods: A total of 118 ultrasound images that included axial and longitudinal images from patients with biopsy-confirmed malignant (n = 30) and benign (n = 29) nodules were collected. Thyroid CAD software was developed to extract quantitative features from these images based on thyroid nodule segmentation in which adaptive diffusion flow for active contours was used. Various features, including histogram, intensity differences, elliptical fit, gray-level co-occurrencemore » matrixes, and gray-level run-length matrixes, were evaluated for each region imaged. Based on these imaging features, a support vector machine (SVM) classifier was used to differentiate benign and malignant nodules. Leave-one-out cross-validation with sequential forward feature selection was performed to evaluate the overall accuracy of this method. Additionally, analyses with contingency tables and receiver operating characteristic (ROC) curves were performed to compare the performance of CAD with visual inspection by expert radiologists based on established gold standards. Results: Most univariate features for this proposed CAD system attained accuracies that ranged from 78.0% to 83.1%. When optimal SVM parameters that were established using a grid search method with features that radiologists use for visual inspection were employed, the authors could attain rates of accuracy that ranged from 72.9% to 84.7%. Using leave-one-out cross-validation results in a multivariate analysis of various features, the highest accuracy achieved using the proposed CAD system was 98.3%, whereas visual inspection by radiologists reached 94.9% accuracy. To obtain the highest accuracies, “axial ratio” and “max probability” in axial images were most frequently included in the optimal feature sets for the authors’ proposed CAD system, while “shape” and “calcification” in longitudinal images were most frequently included in the optimal feature sets for visual inspection by radiologists. The computed areas under curves in the ROC analysis were 0.986 and 0.979 for the proposed CAD system and visual inspection by radiologists, respectively; no significant difference was detected between these groups. Conclusions: The use of thyroid CAD to differentiate malignant from benign lesions shows accuracy similar to that obtained via visual inspection by radiologists. Thyroid CAD might be considered a viable way to generate a second opinion for radiologists in clinical practice.« less
  • Ensemble classifiers have been shown efficient in multiple applications. In this article, the authors explore the effectiveness of ensemble classifiers in a case-based computer-aided diagnosis system for detection of masses in mammograms. They evaluate two general ways of constructing subclassifiers by resampling of the available development dataset: Random division and random selection. Furthermore, they discuss the problem of selecting the ensemble size and propose two adaptive incremental techniques that automatically select the size for the problem at hand. All the techniques are evaluated with respect to a previously proposed information-theoretic CAD system (IT-CAD). The experimental results show that the examinedmore » ensemble techniques provide a statistically significant improvement (AUC=0.905{+-}0.024) in performance as compared to the original IT-CAD system (AUC=0.865{+-}0.029). Some of the techniques allow for a notable reduction in the total number of examples stored in the case base (to 1.3% of the original size), which, in turn, results in lower storage requirements and a shorter response time of the system. Among the methods examined in this article, the two proposed adaptive techniques are by far the most effective for this purpose. Furthermore, the authors provide some discussion and guidance for choosing the ensemble parameters.« less
  • Correlation of information from multiple-view mammograms (e.g., MLO and CC views, bilateral views, or current and prior mammograms) can improve the performance of breast cancer diagnosis by radiologists or by computer. The nipple is a reliable and stable landmark on mammograms for the registration of multiple mammograms. However, accurate identification of nipple location on mammograms is challenging because of the variations in image quality and in the nipple projections, resulting in some nipples being nearly invisible on the mammograms. In this study, we developed a computerized method to automatically identify the nipple location on digitized mammograms. First, the breast boundarymore » was obtained using a gradient-based boundary tracking algorithm, and then the gray level profiles along the inside and outside of the boundary were identified. A geometric convergence analysis was used to limit the nipple search to a region of the breast boundary. A two-stage nipple detection method was developed to identify the nipple location using the gray level information around the nipple, the geometric characteristics of nipple shapes, and the texture features of glandular tissue or ducts which converge toward the nipple. At the first stage, a rule-based method was designed to identify the nipple location by detecting significant changes of intensity along the gray level profiles inside and outside the breast boundary and the changes in the boundary direction. At the second stage, a texture orientation-field analysis was developed to estimate the nipple location based on the convergence of the texture pattern of glandular tissue or ducts towards the nipple. The nipple location was finally determined from the detected nipple candidates by a rule-based confidence analysis. In this study, 377 and 367 randomly selected digitized mammograms were used for training and testing the nipple detection algorithm, respectively. Two experienced radiologists identified the nipple locations which were used as the gold standard. In the training data set, 301 nipples were positively identified and were referred to as visible nipples. Seventy six nipples could not be positively identified and were referred to as invisible nipples. The radiologists provided their estimation of the nipple locations in the latter group for comparison with the computer estimates. The computerized method could detect 89.37% (269/301) of the visible nipples and 69.74% (53/76) of the invisible nipples within 1 cm of the gold standard. In the test data set, 298 and 69 of the nipples were classified as visible and invisible, respectively. 92.28% (275/298) of the visible nipples and 53.62% (37/69) of the invisible nipples were identified within 1 cm of the gold standard. The results demonstrate that the nipple locations on digitized mammograms can be accurately detected if they are visible and can be reasonably estimated if they are invisible. Automated nipple detection will be an important step towards multiple image analysis for CAD.« less
  • We have developed a false positive (FP) reduction method based on analysis of bilateral mammograms for computerized mass detection systems. The mass candidates on each view were first detected by our unilateral computer-aided detection (CAD) system. For each detected object, a regional registration technique was used to define a region of interest (ROI) that is ''symmetrical'' to the object location on the contralateral mammogram. Texture features derived from the spatial gray level dependence matrices and morphological features were extracted from the ROI containing the detected object on a mammogram and its corresponding ROI on the contralateral mammogram. Bilateral features weremore » then generated from corresponding pairs of unilateral features for each object. Two linear discriminant analysis (LDA) classifiers were trained from the unilateral and the bilateral feature spaces, respectively. Finally, the scores from the unilateral LDA classifier and the bilateral LDA asymmetry classifier were fused with a third LDA whose output score was used to distinguish true mass from FPs. A data set of 341 cases of bilateral two-view mammograms was used in this study, of which 276 cases with 552 bilateral pairs contained 110 malignant and 166 benign biopsy-proven masses and 65 cases with 130 bilateral pairs were normal. The mass data set was divided into two subsets for twofold cross-validation training and testing. The normal data set was used for estimation of FP rates. It was found that our bilateral CAD system achieved a case-based sensitivity of 70%, 80%, and 85% at average FP rates of 0.35, 0.75, and 0.95 FPs/image, respectively, on the test data sets with malignant masses. In comparison to the average FP rates for the unilateral CAD system of 0.58, 1.33, and 1.63, respectively, at the corresponding sensitivities, the FP rates were reduced by 40%, 44%, and 42% with the bilateral symmetry information. The improvement was statistically significance (p<0.05) as estimated by JAFROC analysis.« less