skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Evaluation of normalization methods for cDNA microarray data by k-NN classification

Journal Article · · BMC Bioinformatics
 [1];  [2];  [3];  [3];  [3]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Life Sciences Divsion; Univ. of Pittsburgh, PA (United States). Medical Center. Allergy and Critical Care Medicine. Division of Pulmonary. Dorothy P. and Richard P. Simmons Center for Interstitial Lung Disease
  2. Carnegie Mellon Univ., Pittsburgh, PA (United States). School of Computer Science. Center for Automated Learning and Discovery. Language Technology Inst.
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Life Sciences Divsion

Background: Non-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification. Results: Ten location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using NONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatialdependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error. Conclusion: Using LOOCV error of k-NNs as the evaluation criterion, three double-bias-removal normalization strategies, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, outperform other strategies for removing spatial effect, intensity effect and scale differences from cDNA microarray data. The apparent sensitivity of k-NN LOOCV classification error to dye biases suggests that this criterion provides an informative measure for evaluating normalization methods. All the computational tools used in this study were implemented using the R language for statistical computing and graphics.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1626306
Journal Information:
BMC Bioinformatics, Vol. 6, Issue 1; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

Cited By (5)

Data Mining and Meta-Analysis on DNA Microarray Data
  • Paparountas, Triantafyllos; Nikolaidou-Katsaridou, Maria Nefeli; Rustici, Gabriella
  • International Journal of Systems Biology and Biomedical Technologies, Vol. 1, Issue 3 https://doi.org/10.4018/ijsbbt.2012070101
journal July 2012
Ten quick tips for machine learning in computational biology journal December 2017
Data Mining and Meta-Analysis on DNA Microarray Data book January 2013
Performance Evaluation and Online Realization of Data-driven Normalization Methods Used in LC/MS based Untargeted Metabolomics Analysis journal December 2016
Comparison of pre-processing methods for multiplex bead-based immunoassays journal August 2016