skip to main content

DOE PAGESDOE PAGES

Title: Proteogenomic strategies for identification of aberrant cancer peptides using large-scale Next Generation Sequencing data

Cancer is driven by the acquisition of somatic DNA lesions. Distinguishing the early driver mutations from subsequent passenger mutations is key to molecular sub-typing of cancers, and the discovery of novel biomarkers. The availability of genomics technologies (mainly wholegenome and exome sequencing, and transcript sampling via RNA-seq, collectively referred to as NGS) have fueled recent studies on somatic mutation discovery. However, the vision is challenged by the complexity, redundancy, and errors in genomic data, and the difficulty of investigating the proteome using only genomic approaches. Recently, combination of proteomic and genomic technologies are increasingly employed. However, the complexity and redundancy of NGS data remains a challenge for proteogenomics, and various trade-offs must be made to allow for the searches to take place. This paperprovides a discussion of two such trade-offs, relating to large database search, and FDR calculations, and their implication to cancer proteogenomics. Moreover, it extends and develops the idea of a unified genomic variant database that can be searched by any mass spectrometry sample. A total of 879 BAM files downloaded from TCGA repository were used to create a 4.34 GB unified FASTA database which contained 2,787,062 novel splice junctions, 38,464 deletions, 1105 insertions, and 182,302 substitutions. Proteomicmore » data from a single ovarian carcinoma sample (439,858 spectra) was searched against the database. By applying the most conservative FDR measure, we have identified 524 novel peptides and 65,578 known peptides at 1% FDR threshold. The novel peptides include interesting examples of doubly mutated peptides, frame-shifts, and non-sample-recruited mutations, which emphasize the strength of our approach.« less
Authors:
 [1] ;  [1] ;  [2] ;  [1] ;  [3] ;  [3] ;  [3] ;  [3] ;  [2]
  1. Univ. of California, San Diego, CA (United States). Dept. of Electrical and Computer Engineering
  2. Univ. of California, San Diego, CA (United States). Dept. of Computer Science and Engineering
  3. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Publication Date:
OSTI Identifier:
1166875
Report Number(s):
PNNL-SA--105664
Journal ID: ISSN 1615-9853; 46206; 48135; 400412000
Grant/Contract Number:
AC05-76RL01830; DGE-0504645; U24-CA-160019; P41GM103493
Type:
Accepted Manuscript
Journal Name:
Proteomics
Additional Journal Information:
Journal Volume: 14; Journal Issue: 23-24; Journal ID: ISSN 1615-9853
Publisher:
Wiley
Research Org:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US), Environmental Molecular Sciences Laboratory (EMSL)
Sponsoring Org:
USDOE; NIH; NSF
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 60 APPLIED LIFE SCIENCES Proteogenomics; Ovarian cancer; Mutated peptide identification; Cancer; MS