skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications

Abstract

Recent advances in high-throughput technologies enable quantitative monitoring of the abundance of various biological molecules and allow determination of their variation between biological states on a genomic scale. Two popular platforms areDNA microarrays to measure messenger RNA transcript levels, and gel-free proteomic analyses to determine protein abundance. Obviously, no single approach can fully unravel the complexities of fundamental biology and it is equally clear that integrative analysis of multiple levels of gene expression would be valuable in this endeavor. However, most integrative transcriptomic and proteomic studies have thus far either failed to find a correlation or have only observed a weak correlation. It is evident that this failure is not biologically based, but rather is related the inadequacy of available statistical tools to compensate for biases in the data collection methodologies. To address this issue, attempts have recently been made to systematically investigate the correlation patterns between transcriptomic and proteomic datasets, and to develop more sophisticated statistical tools to improve the chances of capturing a relationship. The goal of these investigations is to enhance our understanding of the relationship between transcriptome and proteome data so that integrative analyses may be utilized to reveal new biological insights that are not accessiblemore » through one dimensional datasets. In this review, we outline some of the challenges associated with integrative analyses and present some preliminary solutions based on progress being made in recent years. In addition, some new applications of integrated transcriptomic and proteomic analysis to the investigation of post-transcriptional regulation will also be discussed.« less

Authors:
; ; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
944514
Report Number(s):
PNNL-SA-53025
TRN: US200902%%808
DOE Contract Number:
AC05-76RL01830
Resource Type:
Journal Article
Resource Relation:
Journal Name: Critical Reviews in Biotechnology, 27(2):63-75; Journal Volume: 27; Journal Issue: 2
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; ABUNDANCE; BIOLOGY; DNA; GENES; MESSENGER-RNA; MONITORING; PROTEINS; REGULATIONS; Transcriptomics; Proteomics; Intergration; Statistical; Review

Citation Formats

Nie, Lei, Wu, Gang, Culley, David E., Scholten, Johannes C., and Zhang, Weiwen. Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications. United States: N. p., 2007. Web. doi:10.1080/07388550701334212.
Nie, Lei, Wu, Gang, Culley, David E., Scholten, Johannes C., & Zhang, Weiwen. Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications. United States. doi:10.1080/07388550701334212.
Nie, Lei, Wu, Gang, Culley, David E., Scholten, Johannes C., and Zhang, Weiwen. Sun . "Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications". United States. doi:10.1080/07388550701334212.
@article{osti_944514,
title = {Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications},
author = {Nie, Lei and Wu, Gang and Culley, David E. and Scholten, Johannes C. and Zhang, Weiwen},
abstractNote = {Recent advances in high-throughput technologies enable quantitative monitoring of the abundance of various biological molecules and allow determination of their variation between biological states on a genomic scale. Two popular platforms areDNA microarrays to measure messenger RNA transcript levels, and gel-free proteomic analyses to determine protein abundance. Obviously, no single approach can fully unravel the complexities of fundamental biology and it is equally clear that integrative analysis of multiple levels of gene expression would be valuable in this endeavor. However, most integrative transcriptomic and proteomic studies have thus far either failed to find a correlation or have only observed a weak correlation. It is evident that this failure is not biologically based, but rather is related the inadequacy of available statistical tools to compensate for biases in the data collection methodologies. To address this issue, attempts have recently been made to systematically investigate the correlation patterns between transcriptomic and proteomic datasets, and to develop more sophisticated statistical tools to improve the chances of capturing a relationship. The goal of these investigations is to enhance our understanding of the relationship between transcriptome and proteome data so that integrative analyses may be utilized to reveal new biological insights that are not accessible through one dimensional datasets. In this review, we outline some of the challenges associated with integrative analyses and present some preliminary solutions based on progress being made in recent years. In addition, some new applications of integrated transcriptomic and proteomic analysis to the investigation of post-transcriptional regulation will also be discussed.},
doi = {10.1080/07388550701334212},
journal = {Critical Reviews in Biotechnology, 27(2):63-75},
number = 2,
volume = 27,
place = {United States},
year = {Sun Apr 01 00:00:00 EDT 2007},
month = {Sun Apr 01 00:00:00 EDT 2007}
}
  • Despite significant improvements in recent years, proteomic datasets currently available still suffer large number of missing values. Integrative analyses based upon incomplete proteomic and transcriptomic da-tasets could seriously bias the biological interpretation. In this study, we applied a non-linear data-driven stochastic gradient boosted trees (GBT) model to impute missing proteomic values for proteins experi-mentally undetected, using a temporal transcriptomic and proteomic dataset of Shewanella oneidensis. In this dataset, genes expression was measured after the cells were exposed to 1 mM potassium chromate for 5-, 30-, 60-, and 90-min, while protein abundance was measured only for 45- and 90-min samples. Withmore » the goal of elucidating the relationship between temporal gene expression and protein abundance data, and then using it to impute missing proteomic values for samples of 45-min (which does not have cognate transcriptomic data) and 90-min, we initially used nonlinear Smoothing Splines Curve Fitting (SSCF) to identify temporal relationships among transcriptomic data at different time points and then imputed missing gene expression measurements for the sample at 45-min. After the imputation was validated by biological constrains (i.e. operons), we used a data-driven Gradient Boosted Trees (GBT) model to uncover possible non-linear relationships between temporal transcriptomic and proteomic data, and to impute protein abundance for the proteins experimentally undetected in the 45- and 90-min sam-ples, based on relevant predictors such as temporal mRNA gene expression data, cellular roles, molecular weight, sequence length, protein length, guanine-cytosine (GC) content and triple codon counts. The imputed protein values were validated using biological constraints such as operon, regulon and pathway information. Finally, we demonstrated that such missing value imputation improved characterization of the temporal response of S. oneidensis to chromate.« less
  • Abstract Advances in DNA microarray and proteomics technologies have enabled high-throughput measurement of mRNA expression and protein abundance. Parallel profiling of mRNA and protein on a global scale and integrative analysis of these two data types could provide additional insight into the metabolic mechanisms underlying complex biological systems. However, because protein abundance and mRNA expression are affected by many cellular and physical processes, there have been conflicting results on the correlation of these two measurements. In addition, as current proteomic methods can detect only a small fraction of proteins present in cells, no correlation study of these two data typesmore » has been done thus far at the whole-genome level. In this study, we describe a novel data-driven statistical model to integrate whole-genome microarray and proteomic data collected from Desulfovibrio vulgaris grown under three different conditions. Based on the Poisson distribution pattern of proteomic data and the fact that a large number of proteins were undetected (excess zeros), Zero-inflated Poisson models were used to define the correlation pattern of mRNA and protein abundance. The models assumed that there is a probability mass at zero representing some of the undetected proteins because of technical limitations. The models thus use abundance measurements of transcripts and proteins experimentally detected as input to generate predictions of protein abundances as output for all genes in the genome. We demonstrated the statistical models by comparatively analyzing D. vulgaris grown on lactate-based versus formate-based media. The increased expressions of Ech hydrogenase and alcohol dehydrogenase (Adh)-periplasmic Fe-only hydrogenase (Hyd) pathway for ATP synthesis were predicted for D. vulgaris grown on formate.« less
  • Visual Exploration and Statistics to Promote Annotation (VESPA) is an interactive visual analysis software tool that facilitates the discovery of structural mis-annotations in prokaryotic genomes. VESPA integrates high-throughput peptide-centric proteomics data and oligo-centric or RNA-Seq transcriptomics data into a genomic context. The data may be interrogated via visual analysis across multiple levels of genomic resolution, linked searches, exports and interaction with BLAST to rapidly identify location of interest within the genome and evaluate potential mis-annotations.
  • Phosphatidylinositol-3-kinase (PI3K)/AKT pathway aberrations are common in cancer. By applying mass spectroscopy-based sequencing and reverse phase protein arrays to 547 human breast cancers and 41 cell lines, we determined the subtype specificity and signaling effects of PIK3CA, AKT and PTEN mutations, and the effects of PIK3CA mutations on responsiveness to PI3K inhibition in-vitro and on outcome after adjuvant tamoxifen. PIK3CA mutations were more common in hormone receptor positive (33.8%) and HER2-positive (24.6%) than in basal-like tumors (8.3%). AKT1 (1.4%) and PTEN (2.3%) mutations were restricted to hormone receptor-positive cancers with PTEN protein levels also being significantly lower in hormone receptor-positivemore » cancers. Unlike AKT1 mutations, PIK3CA (39%) and PTEN (20%) mutations were more common in cell lines than tumors, suggesting a selection for these but not AKT1 mutations during adaptation to culture. PIK3CA mutations did not have a significant impact on outcome in 166 hormone receptor-positive breast cancer patients after adjuvant tamoxifen. PIK3CA mutations, in comparison with PTEN loss and AKT1 mutations, were associated with significantly less and indeed inconsistent activation of AKT and of downstream PI3K/AKT signaling in tumors and cell lines, and PTEN loss and PIK3CA mutation were frequently concordant, suggesting different contributions to pathophysiology. PTEN loss but not PIK3CA mutations rendered cells sensitive to growth inhibition by the PI3K inhibitor LY294002. Thus, PI3K pathway aberrations likely play a distinct role in the pathogenesis of different breast cancer subtypes. The specific aberration may have implications for the selection of PI3K-targeted therapies in hormone receptor-positive breast cancer.« less
  • The molecular mechanisms underlying the changes in the nigrostriatal pathway in Parkinson disease (PD) are not completely understood. Here we use mass spectrometry and microarrays to study the proteomic and transcriptomic changes in the striatum of two mouse models of PD, induced by the distinct neurotoxins 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP) and methamphetamine (METH). Proteomic analyses resulted in the identification and relative quantification of 912 proteins with two or more unique peptides and 85 proteins with significant abundance changes following neurotoxin treatment. Similarly, microarray analyses revealed 181 genes with significant changes in mRNA following neurotoxin treatment. The combined protein and gene list providesmore » a clearer picture of the potential mechanisms underlying neurodegeneration observed in PD. Functional analysis of this combined list revealed a number of significant categories, including mitochondrial dysfunction, oxidative stress response and apoptosis. Additionally, codon usage and miRNAs may play an important role in translational control in the striatum. These results constitute one of the largest datasets integrating protein and transcript changes for these neurotoxin models with many similar endpoint phenotypes but distinct mechanisms.« less