skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Top-down analysis of protein samples by de novo sequencing techniques

Abstract

MOTIVATION: Recent technological advances have made high-resolution mass spectrometers affordable to many laboratories, thus boosting rapid development of top-down mass spectrometry, and implying a need in efficient methods for analyzing this kind of data. RESULTS: We describe a method for analysis of protein samples from top-down tandem mass spectrometry data, which capitalizes on de novo sequencing of fragments of the proteins present in the sample. Our algorithm takes as input a set of de novo amino acid strings derived from the given mass spectra using the recently proposed Twister approach, and combines them into aggregated strings endowed with offsets. The former typically constitute accurate sequence fragments of sufficiently well-represented proteins from the sample being analyzed, while the latter indicate their location in the protein sequence, and also bear information on post-translational modifications and fragmentation patterns.

Authors:
; ; ; ; ; ; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1342327
Report Number(s):
PNNL-SA-119182
Journal ID: ISSN 1367-4803; 49209; 46894; KP1704020
DOE Contract Number:
AC05-76RL01830
Resource Type:
Journal Article
Resource Relation:
Journal Name: Bioinformatics; Journal Volume: 32; Journal Issue: 18
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Environmental Molecular Sciences Laboratory

Citation Formats

Vyatkina, Kira, Wu, Si, Dekker, Lennard J. M., VanDuijn, Martijn M., Liu, Xiaowen, Tolić, Nikola, Luider, Theo M., Paša-Tolić, Ljiljana, and Pevzner, Pavel A. Top-down analysis of protein samples by de novo sequencing techniques. United States: N. p., 2016. Web. doi:10.1093/bioinformatics/btw307.
Vyatkina, Kira, Wu, Si, Dekker, Lennard J. M., VanDuijn, Martijn M., Liu, Xiaowen, Tolić, Nikola, Luider, Theo M., Paša-Tolić, Ljiljana, & Pevzner, Pavel A. Top-down analysis of protein samples by de novo sequencing techniques. United States. doi:10.1093/bioinformatics/btw307.
Vyatkina, Kira, Wu, Si, Dekker, Lennard J. M., VanDuijn, Martijn M., Liu, Xiaowen, Tolić, Nikola, Luider, Theo M., Paša-Tolić, Ljiljana, and Pevzner, Pavel A. 2016. "Top-down analysis of protein samples by de novo sequencing techniques". United States. doi:10.1093/bioinformatics/btw307.
@article{osti_1342327,
title = {Top-down analysis of protein samples by de novo sequencing techniques},
author = {Vyatkina, Kira and Wu, Si and Dekker, Lennard J. M. and VanDuijn, Martijn M. and Liu, Xiaowen and Tolić, Nikola and Luider, Theo M. and Paša-Tolić, Ljiljana and Pevzner, Pavel A.},
abstractNote = {MOTIVATION: Recent technological advances have made high-resolution mass spectrometers affordable to many laboratories, thus boosting rapid development of top-down mass spectrometry, and implying a need in efficient methods for analyzing this kind of data. RESULTS: We describe a method for analysis of protein samples from top-down tandem mass spectrometry data, which capitalizes on de novo sequencing of fragments of the proteins present in the sample. Our algorithm takes as input a set of de novo amino acid strings derived from the given mass spectra using the recently proposed Twister approach, and combines them into aggregated strings endowed with offsets. The former typically constitute accurate sequence fragments of sufficiently well-represented proteins from the sample being analyzed, while the latter indicate their location in the protein sequence, and also bear information on post-translational modifications and fragmentation patterns.},
doi = {10.1093/bioinformatics/btw307},
journal = {Bioinformatics},
number = 18,
volume = 32,
place = {United States},
year = 2016,
month = 5
}
  • De novo sequencing of proteins and peptides is one of the most important problems in mass spectrometry-driven proteomics. A variety of methods have been developed to accomplish this task from a set of bottom-up tandem (MS/MS) mass spectra. However, a more recently emerged top-down technology, now gaining more and more popularity, opens new perspectives for protein analysis and characterization, implying a need in efficient algorithms for processing this kind of MS/MS data. Here we describe a method that allows to retrieve from a set of top-down MS/MS spectra long and accurate sequence fragments of the proteins contained in a sample.more » To this end, we outline a strategy for generating high-quality sequence tags from top-down spectra, and introduce the concept of a T-Bruijn graph by adapting to the case of tags the notion of an A-Bruijn graph widely used in genomics. The output of the proposed approach represents the set of amino acid strings spelled out by optimal paths in the connected components of a T-Bruijn graph. We illustrate its performance on top-down datasets acquired from carbonic anhydrase 2 (CAH2) and the Fab region of alemtuzumab.« less
  • We measured the synthesis of diacylglycerol de novo in normal NIH/3T3 fibroblasts and in cells transformed by ras, src, sis and abl oncogenes. Analysis of the incorporation of glucose-derived {sup 14}C into diacylglycerol indicated that neosynthesis of diacylglycerol was constitutively active in the transformed cell lines. Elevated levels of diacylglycerol and persistent activation/down-regulation of protein kinase C reduced the binding of phorbol dibutyrate to transformed cells. This phenomenon could be reversed by blocking the glycolytic pathway, thus indicating that neosynthesized diacylglycerol was responsible for persistent activation and down-regulation of protein kinase C. In transformed cells, protein kinase C activity couldmore » not be stimulated by the addition of diolein; however, inhibition of glycolysis restored the ability of transformed cells to respond to diolein. Taken together these data indicate that constitutive synthesis of diacylglycerol de novo is responsible for activation and down-regulation of protein kinase C in transformed cells, and it may play a role in altered mitogenic signalling.« less
  • The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
  • The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
  • In a first systematic exploration of phasing with Rosetta de novo models, it is shown that all-atom refinement of coarse-grained models significantly improves both the model quality and performance in molecular replacement with the Phaser software. The prospect of phasing diffraction data sets ‘de novo’ for proteins with previously unseen folds is appealing but largely untested. In a first systematic exploration of phasing with Rosetta de novo models, it is shown that all-atom refinement of coarse-grained models significantly improves both the model quality and performance in molecular replacement with the Phaser software. 15 new cases of diffraction data sets thatmore » are unambiguously phased with de novo models are presented. These diffraction data sets represent nine space groups and span a large range of solvent contents (33–79%) and asymmetric unit copy numbers (1–4). No correlation is observed between the ease of phasing and the solvent content or asymmetric unit copy number. Instead, a weak correlation is found with the length of the modeled protein: larger proteins required somewhat less accurate models to give successful molecular replacement. Overall, the results of this survey suggest that de novo models can phase diffraction data for approximately one sixth of proteins with sizes of 100 residues or less. However, for many of these cases, ‘de novo phasing with de novo models’ requires significant investment of computational power, much greater than 10{sup 3} CPU days per target. Improvements in conformational search methods will be necessary if molecular replacement with de novo models is to become a practical tool for targets without homology to previously solved protein structures.« less