skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics

Journal Article · · Journal of Proteome Research
DOI:https://doi.org/10.1021/pr501138h· OSTI ID:1287493

In this review, we apply selected imputation strategies to label-free liquid chromatography–mass spectrometry (LC–MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC–MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yielded the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. In summary, on the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC06-76RL01830; DK071283; HHSN27220080060C; U01CA184783-01; U54-ES016015; P41-RR018522; P41-GM103493
OSTI ID:
1287493
Journal Information:
Journal of Proteome Research, Vol. 14, Issue 5; ISSN 1535-3893
Publisher:
American Chemical Society (ACS)Copyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 145 works
Citation information provided by
Web of Science

References (43)

Super-SILAC Allows Classification of Diffuse Large B-cell Lymphoma Subtypes by Their Protein Expression Profiles journal May 2012
Missing value estimation methods for DNA microarrays journal June 2001
DAnTE: a statistical tool for quantitative analysis of -omics data journal May 2008
How advancement in biological network analysis methods empowers proteomics journal January 2012
Statistical Design of Quantitative Mass Spectrometry-Based Proteomic Experiments journal March 2009
Biological impact of missing-value imputation on downstream analyses of gene expression profiles journal November 2010
Probabilistic Principal Component Analysis journal August 1999
Comparative Network-Based Recovery Analysis and Proteomic Profiling of Neurological Changes in Valproic Acid-Treated Mice journal April 2013
A Review of Experimental Design Best Practices for Proteomics Based Biomarker Discovery: Focus on SELDI-TOF book January 2010
A review on recent developments in mass spectrometry instrumentation and quantitative tools advancing bacterial proteomics journal April 2013
Review: A gentle introduction to imputation of missing values journal October 2006
A statistical framework for protein quantitation in bottom-up MS-based proteomics journal June 2009
Assessment and Improvement of Statistical Tools for Comparative Proteomics Analysis of Sparse Data Sets with Few Experimental Replicates journal August 2013
Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present journal July 2012
Normalization Approaches for Removing Systematic Biases Associated with Mass Spectrometry and Label-Free Proteomics journal February 2006
Addressing the Challenge of Defining Valid Proteomic Biomarkers and Classifiers journal January 2010
Statistical Similarities between Transcriptomics and Quantitative Shotgun Proteomics Data journal April 2008
DanteR: an extensible R-based tool for quantitative analysis of -omics data journal July 2012
A statistical selection strategy for normalization procedures in LC-MS proteomics experiments through dataset-dependent ranking of normalization scaling factors journal November 2011
Urinary Protein Profiles in a Rat Model for Diabetic Complications journal September 2009
Proteoform: a single term describing protein complexity journal February 2013
Quantitation in Mass-Spectrometry-Based Proteomics journal June 2010
LSimpute: accurate estimation of missing values in microarray data with least squares methods journal February 2004
Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs journal January 2012
Normalization and missing value imputation for label-free LC-MS analysis journal November 2012
Missing value imputation improves clustering and interpretation of gene expression microarray data journal April 2008
Data merging for integrated microarray and proteomic analysis journal May 2006
Bayesian Analysis of iTRAQ Data with Nonrandom Missingness: Identification of Differentially Expressed Proteins journal October 2009
Missing values in gel-based proteomics journal January 2010
A comparative analysis of computational approaches to relative protein quantification using peptide peak intensities in label-free LC-MS proteomics experiments journal November 2012
A Bayesian missing value estimation method for gene expression profile data journal October 2003
A review of current proteomics technologies with a survey on their widespread use in reproductive biology investigations journal March 2012
Missing value estimation for DNA microarray gene expression data: local least squares imputation journal August 2004
Quantitative proteomics combined with BAC TransgeneOmics reveals in vivo protein interactions journal May 2010
Serum Proteomics in Biomedical Research: A Systematic Review journal April 2013
Dealing with missing values in large-scale studies: microarray data imputation and beyond journal December 2009
Mass-spectrometry-based clinical proteomics – a review and prospective journal January 2010
Using a spike-in experiment to evaluate analysis of LC-MS data journal January 2012
Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes journal January 2008
Application of survival analysis methodology to the quantitative analysis of LC-MS proteomics data journal May 2012
Combined Statistical Analyses of Peptide Intensities and Peptide Occurrences Improves Identification of Significant Peptides from MS-Based Proteomics Data journal November 2010
Bayesian Proteoform Modeling Improves Protein Quantification of Global Proteomic Measurements journal November 2014
Sequential projection pursuit principal component analysis – dealing with missing data associated with new -omics technologies journal March 2013

Cited By (42)

MetaPro-IQ: a universal metaproteomic approach to studying human and mouse gut microbiota journal June 2016
Comparison of iTRAQ and SWATH in a clinical study with multiple time points journal July 2018
Global proteomics profiling improves drug sensitivity prediction: results from a multi-omics, pan-cancer modeling approach journal November 2017
Using hyperLOPIT to perform high-resolution mapping of the spatial proteome journal May 2017
Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study journal October 2019
BayesENproteomics: Bayesian elastic nets for quantification of proteoforms in complex samples posted_content May 2019
Focus on the spectra that matter by clustering of quantification data in shotgun proteomics journal June 2020
Disease-specific IgG Fc N-glycosylation as personalized biomarkers to differentiate gastric cancer from benign gastric diseases journal May 2016
Machine Learning Applications for Mass Spectrometry-Based Metabolomics journal June 2020
A Bioconductor workflow for processing and analysing spatial proteomics data text January 2016
AKT but not MYC promotes reactive oxygen species-mediated cell death in oxidative culture journal February 2020
The use of missing values in proteomic data-independent acquisition mass spectrometry to enable disease activity discrimination journal December 2019
Robust determination of differential abundance in shotgun proteomics using nonparametric statistics journal January 2018
Advanced bioinformatics methods for practical applications in proteomics journal October 2017
AKT but not MYC promotes reactive oxygen species-mediated cell death in oxidative culture posted_content September 2019
Associations of diet and lifestyle factors with common volatile organic compounds in exhaled breath of average-risk individuals journal March 2019
Decreased Antibiotic Susceptibility Driven by Global Remodeling of the Klebsiella pneumoniae Proteome journal January 2019
The impact of nanoparticle protein corona on cytotoxicity, immunotoxicity and target drug delivery journal January 2016
Machine learning and feature selection for drug response prediction in precision oncology applications journal August 2018
MSqRob takes the missing hurdle: uniting intensity- and count-based proteomics posted_content November 2019
Identification of differentially expressed peptides in high-throughput proteomics data journal March 2017
Machine Learning Applications for Mass Spectrometry-Based Metabolomics text January 2020
Proteins involved in embryo-maternal interaction around the signalling of maternal recognition of pregnancy in the horse journal March 2018
Fuzzy-FishNET: a highly reproducible protein complex-based approach for feature selection in comparative proteomics journal December 2016
A Review on Quantitative Multiplexed Proteomics journal April 2019
Characterization of Lipid and Lipid Droplet Metabolism in Human HCC journal May 2019
A Bioconductor workflow for processing and analysing spatial proteomics data journal January 2016
A Bioconductor workflow for processing and analysing spatial proteomics data journal January 2016
Characterization of Lipid and Lipid Droplet Metabolism in Human HCC text January 2019
MS1 ion current‐based quantitative proteomics: A promising solution for reliable analysis of large biological cohorts journal March 2019
Phosphoproteomics in the Age of Rapid and Deep Proteome Profiling journal November 2015
PaDuA: A Python Library for High-Throughput (Phospho)proteomics Data Analysis journal December 2018
pmartR : Quality Control and Statistics for Mass Spectrometry-Based Biological Data journal January 2019
In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values journal June 2017
A Bayesian algorithm for detecting differentially expressed proteins and its application in breast cancer research journal July 2016
Quantitative proteomic and phosphoproteomic comparison of human colon cancer DLD-1 cells differing in ploidy and chromosome stability journal May 2018
A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation journal May 2017
Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing posted_content September 2016
Integrating Transcriptomic and Proteomic Data Using Predictive Regulatory Network Models of Host Response to Pathogens journal July 2016
The Role of EGFR in Influenza Pathogenicity: Multiple Network-Based Approaches to Identify a Key Regulator of Non-lethal Infections journal September 2019
Standardizing Proteomics Workflow for Liquid Chromatography-Mass Spectrometry: Technical and Statistical Considerations journal January 2019
Dynamic post-translational modification profiling of Mycobacterium tuberculosis-infected primary macrophages journal January 2020

Figures / Tables (6)