skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Integrative analysis of transcriptomic and proteomic data of Shewanella oneidensis: missing value imputation using temporal datasets

Journal Article · · Molecular BioSystems
DOI:https://doi.org/10.1039/c0mb00260g· OSTI ID:1081674

Despite significant improvements in recent years, proteomic datasets currently available still suffer large number of missing values. Integrative analyses based upon incomplete proteomic and transcriptomic da-tasets could seriously bias the biological interpretation. In this study, we applied a non-linear data-driven stochastic gradient boosted trees (GBT) model to impute missing proteomic values for proteins experi-mentally undetected, using a temporal transcriptomic and proteomic dataset of Shewanella oneidensis. In this dataset, genes expression was measured after the cells were exposed to 1 mM potassium chromate for 5-, 30-, 60-, and 90-min, while protein abundance was measured only for 45- and 90-min samples. With the goal of elucidating the relationship between temporal gene expression and protein abundance data, and then using it to impute missing proteomic values for samples of 45-min (which does not have cognate transcriptomic data) and 90-min, we initially used nonlinear Smoothing Splines Curve Fitting (SSCF) to identify temporal relationships among transcriptomic data at different time points and then imputed missing gene expression measurements for the sample at 45-min. After the imputation was validated by biological constrains (i.e. operons), we used a data-driven Gradient Boosted Trees (GBT) model to uncover possible non-linear relationships between temporal transcriptomic and proteomic data, and to impute protein abundance for the proteins experimentally undetected in the 45- and 90-min sam-ples, based on relevant predictors such as temporal mRNA gene expression data, cellular roles, molecular weight, sequence length, protein length, guanine-cytosine (GC) content and triple codon counts. The imputed protein values were validated using biological constraints such as operon, regulon and pathway information. Finally, we demonstrated that such missing value imputation improved characterization of the temporal response of S. oneidensis to chromate.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
DE-AC05-00OR22725
OSTI ID:
1081674
Journal Information:
Molecular BioSystems, Vol. 7, Issue 7; ISSN 1742--206X
Country of Publication:
United States
Language:
English

Similar Records

Molecular Dynamics of the Shewanella oneidensis Response to Chromate Stress
Journal Article · Sun Jan 01 00:00:00 EST 2006 · Molecular and Cellular Proteomics · OSTI ID:1081674

Molecular Dynamics of the Shewanella oneidensis Response toChromate Stress
Journal Article · Fri Sep 21 00:00:00 EDT 2007 · Molecular and Cellular Proteomics · OSTI ID:1081674

Comparative Temporal Proteomics of a Response Regulator (SO2426)-Deficient Strain and Wild-Type Shewanella oneidensis MR-1 During Chromate Transformation
Journal Article · Thu Jan 01 00:00:00 EST 2009 · Journal of Proteome Research · OSTI ID:1081674

Related Subjects