Proteogenomic strategies for identification of aberrant cancer peptides using large-scale Next Generation Sequencing data

Woo, Sunghee; Cha, Seong Won; Na, Seungjin; Guest, Clark; Liu, Tao; Smith, Richard D.; Rodland, Karin D.; Payne, Samuel H.; Bafna, Vineet

doi:10.1002/pmic.201400206

Title: Proteogenomic strategies for identification of aberrant cancer peptides using large-scale Next Generation Sequencing data

Abstract

Cancer is driven by the acquisition of somatic DNA lesions. Distinguishing the early driver mutations from subsequent passenger mutations is key to molecular sub-typing of cancers, and the discovery of novel biomarkers. The availability of genomics technologies (mainly wholegenome and exome sequencing, and transcript sampling via RNA-seq, collectively referred to as NGS) have fueled recent studies on somatic mutation discovery. However, the vision is challenged by the complexity, redundancy, and errors in genomic data, and the difficulty of investigating the proteome using only genomic approaches. Recently, combination of proteomic and genomic technologies are increasingly employed. However, the complexity and redundancy of NGS data remains a challenge for proteogenomics, and various trade-offs must be made to allow for the searches to take place. This paperprovides a discussion of two such trade-offs, relating to large database search, and FDR calculations, and their implication to cancer proteogenomics. Moreover, it extends and develops the idea of a unified genomic variant database that can be searched by any mass spectrometry sample. A total of 879 BAM files downloaded from TCGA repository were used to create a 4.34 GB unified FASTA database which contained 2,787,062 novel splice junctions, 38,464 deletions, 1105 insertions, and 182,302 substitutions. Proteomicmore »« less

Authors:

Woo, Sunghee ^[1]; Cha, Seong Won ^[1]; Na, Seungjin ^[2]; Guest, Clark ^[1]; Liu, Tao ^[3]; Smith, Richard D. ^[3]; Rodland, Karin D. ^[3]; Payne, Samuel H. ^[3]; Bafna, Vineet ^[2]

Univ. of California, San Diego, CA (United States). Dept. of Electrical and Computer Engineering
Univ. of California, San Diego, CA (United States). Dept. of Computer Science and Engineering
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Publication Date:: Mon Nov 17 00:00:00 EST 2014

Research Org.:: Pacific Northwest National Laboratory (PNNL), Richland, WA (United States). Environmental Molecular Sciences Laboratory (EMSL)

Sponsoring Org.:: USDOE; National Institutes of Health (NIH); National Science Foundation (NSF)

OSTI Identifier:: 1166875

Report Number(s):: PNNL-SA-105664
Journal ID: ISSN 1615-9853; 46206; 48135; 400412000

Grant/Contract Number:: AC05-76RL01830; DGE-0504645; U24-CA-160019; P41GM103493

Resource Type:: Accepted Manuscript

Journal Name:: Proteomics

Additional Journal Information:: Journal Volume: 14; Journal Issue: 23-24; Journal ID: ISSN 1615-9853

Publisher:: Wiley

Country of Publication:: United States

Language:: English

Subject:: 59 BASIC BIOLOGICAL SCIENCES; 60 APPLIED LIFE SCIENCES; Proteogenomics; Ovarian cancer; Mutated peptide identification; Cancer; MS

Citation Formats


                    Woo, Sunghee, Cha, Seong Won, Na, Seungjin, Guest, Clark, Liu, Tao, Smith, Richard D., Rodland, Karin D., Payne, Samuel H., and Bafna, Vineet. Proteogenomic strategies for identification of aberrant cancer peptides using large-scale Next Generation Sequencing data.  United States: N. p., 2014. 
Web.  doi:10.1002/pmic.201400206.

Copy to clipboard


                    Woo, Sunghee, Cha, Seong Won, Na, Seungjin, Guest, Clark, Liu, Tao, Smith, Richard D., Rodland, Karin D., Payne, Samuel H., & Bafna, Vineet. Proteogenomic strategies for identification of aberrant cancer peptides using large-scale Next Generation Sequencing data.  United States.  https://doi.org/10.1002/pmic.201400206

Copy to clipboard


                    Woo, Sunghee, Cha, Seong Won, Na, Seungjin, Guest, Clark, Liu, Tao, Smith, Richard D., Rodland, Karin D., Payne, Samuel H., and Bafna, Vineet. Mon .  
"Proteogenomic strategies for identification of aberrant cancer peptides using large-scale Next Generation Sequencing data".  United States.  https://doi.org/10.1002/pmic.201400206.  https://www.osti.gov/servlets/purl/1166875.

Copy to clipboard


                    
@article{osti_1166875,

  title        = {Proteogenomic strategies for identification of aberrant cancer peptides using large-scale Next Generation Sequencing data},

  author       = {Woo, Sunghee and Cha, Seong Won and Na, Seungjin and Guest, Clark and Liu, Tao and Smith, Richard D. and Rodland, Karin D. and Payne, Samuel H. and Bafna, Vineet},

  abstractNote = {Cancer is driven by the acquisition of somatic DNA lesions. Distinguishing the early driver mutations from subsequent passenger mutations is key to molecular sub-typing of cancers, and the discovery of novel biomarkers. The availability of genomics technologies (mainly wholegenome and exome sequencing, and transcript sampling via RNA-seq, collectively referred to as NGS) have fueled recent studies on somatic mutation discovery. However, the vision is challenged by the complexity, redundancy, and errors in genomic data, and the difficulty of investigating the proteome using only genomic approaches. Recently, combination of proteomic and genomic technologies are increasingly employed. However, the complexity and redundancy of NGS data remains a challenge for proteogenomics, and various trade-offs must be made to allow for the searches to take place. This paperprovides a discussion of two such trade-offs, relating to large database search, and FDR calculations, and their implication to cancer proteogenomics. Moreover, it extends and develops the idea of a unified genomic variant database that can be searched by any mass spectrometry sample. A total of 879 BAM files downloaded from TCGA repository were used to create a 4.34 GB unified FASTA database which contained 2,787,062 novel splice junctions, 38,464 deletions, 1105 insertions, and 182,302 substitutions. Proteomic data from a single ovarian carcinoma sample (439,858 spectra) was searched against the database. By applying the most conservative FDR measure, we have identified 524 novel peptides and 65,578 known peptides at 1% FDR threshold. The novel peptides include interesting examples of doubly mutated peptides, frame-shifts, and non-sample-recruited mutations, which emphasize the strength of our approach.},

  doi          = {10.1002/pmic.201400206},

  journal      = {Proteomics},

  number       = 23-24,

  volume       = 14,

  place        = {United States},

  year         = {Mon Nov 17 00:00:00 EST 2014},

  month        = {Mon Nov 17 00:00:00 EST 2014}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1002/pmic.201400206

Other availability

Search WorldCat to find libraries that may hold this journal

Citation Metrics:

Cited by: 48 works

Citation information provided by
Web of Science

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Integrated genomic analyses of ovarian carcinoma
journal, June 2011

Network, Atlas Research
Nature, Vol. 474, Issue 7353, p. 609-615
DOI: 10.1038/nature10166

Correlation between Protein and mRNA Abundance in Yeast
journal, March 1999

Gygi, Steven P.; Rochon, Yvan; Franza, B. Robert
Molecular and Cellular Biology, Vol. 19, Issue 3
DOI: 10.1128/MCB.19.3.1720

Correlation of mRNA and protein abundance in the developing maize leaf
journal, April 2014

Ponnala, Lalit; Wang, Yupeng; Sun, Qi
The Plant Journal, Vol. 78, Issue 3
DOI: 10.1111/tpj.12482

A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics
journal, March 2011

Li, Jing; Su, Zengliu; Ma, Ze-Qiang
Molecular & Cellular Proteomics, Vol. 10, Issue 5
DOI: 10.1074/mcp.M110.006536

CanProVar: a human cancer proteome variation database
journal, March 2010

Li, Jing; Duncan, Dexter T.; Zhang, Bing
Human Mutation, Vol. 31, Issue 3
DOI: 10.1002/humu.21176

Protein Identification Using Customized Protein Sequence Databases Derived from RNA-Seq Data
journal, December 2011

Wang, Xiaojing; Slebos, Robbert J. C.; Wang, Dong
Journal of Proteome Research, Vol. 11, Issue 2
DOI: 10.1021/pr200766z

TopHat: discovering splice junctions with RNA-Seq
journal, March 2009

Trapnell, Cole; Pachter, Lior; Salzberg, Steven L.
Bioinformatics, Vol. 25, Issue 9
DOI: 10.1093/bioinformatics/btp120

TopHat-Fusion: an algorithm for discovery of novel fusion transcripts
journal, January 2011

Kim, Daehwan; Salzberg, Steven L.
Genome Biology, Vol. 12, Issue 8
DOI: 10.1186/gb-2011-12-8-r72

TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions
journal, January 2013

Kim, Daehwan; Pertea, Geo; Trapnell, Cole
Genome Biology, Vol. 14, Issue 4
DOI: 10.1186/gb-2013-14-4-r36

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
journal, July 2010

McKenna, A.; Hanna, M.; Banks, E.
Genome Research, Vol. 20, Issue 9
DOI: 10.1101/gr.107524.110

A framework for variation discovery and genotyping using next-generation DNA sequencing data
journal, April 2011

DePristo, Mark A.; Banks, Eric; Poplin, Ryan
Nature Genetics, Vol. 43, Issue 5
DOI: 10.1038/ng.806

Proteogenomic Database Construction Driven from Large Scale RNA-seq Data
journal, July 2013

Woo, Sunghee; Cha, Seong Won; Merrihew, Gennifer
Journal of Proteome Research, Vol. 13, Issue 1
DOI: 10.1021/pr400294c

Improving gene annotation using peptide mass spectrometry
journal, January 2007

Tanner, S.; Shen, Z.; Ng, J.
Genome Research, Vol. 17, Issue 2
DOI: 10.1101/gr.5646507

Discovery and revision of Arabidopsis genes by proteogenomics
journal, December 2008

Castellana, N. E.; Payne, S. H.; Shen, Z.
Proceedings of the National Academy of Sciences, Vol. 105, Issue 52
DOI: 10.1073/pnas.0811066106

Proteogenomics to discover the full coding content of genomes: A computational perspective
journal, October 2010

Castellana, Natalie; Bafna, Vineet
Journal of Proteomics, Vol. 73, Issue 11
DOI: 10.1016/j.jprot.2010.06.007

An Automated Proteogenomic Method Uses Mass Spectrometry to Reveal Novel Genes in Zea mays
journal, October 2013

Castellana, Natalie E.; Shen, Zhouxin; He, Yupeng
Molecular & Cellular Proteomics, Vol. 13, Issue 1
DOI: 10.1074/mcp.M113.031260

Novel peptide identification from tandem mass spectra using ESTs and sequence database compression
journal, January 2007

Edwards, Nathan J.
Molecular Systems Biology, Vol. 3, Issue 1
DOI: 10.1038/msb4100142

The Sequence Alignment/Map format and SAMtools
journal, June 2009

Li, H.; Handsaker, B.; Wysoker, A.
Bioinformatics, Vol. 25, Issue 16
DOI: 10.1093/bioinformatics/btp352

The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search
journal, September 2010

Kim, Sangtae; Mischerikow, Nikolai; Bandeira, Nuno
Molecular & Cellular Proteomics, Vol. 9, Issue 12
DOI: 10.1074/mcp.M110.003731

Ensembl 2013
journal, November 2012

Flicek, Paul; Ahmed, Ikhlak; Amode, M. Ridwan
Nucleic Acids Research, Vol. 41, Issue D1
DOI: 10.1093/nar/gks1236

dbSNP: the NCBI database of genetic variation
journal, January 2001

Sherry, S. T.
Nucleic Acids Research, Vol. 29, Issue 1
DOI: 10.1093/nar/29.1.308

De novo derivation of proteomes from transcriptomes for transcript and protein identification
journal, November 2012

Evans, Vanessa C.; Barker, Gary; Heesom, Kate J.
Nature Methods, Vol. 9, Issue 12
DOI: 10.1038/nmeth.2227

Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study
journal, November 2011

Venter, Eli; Smith, Richard D.; Payne, Samuel H.
PLoS ONE, Vol. 6, Issue 11
DOI: 10.1371/journal.pone.0027587

customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search
journal, September 2013

Wang, Xiaojing; Zhang, Bing
Bioinformatics, Vol. 29, Issue 24
DOI: 10.1093/bioinformatics/btt543

Ancient genomes reveal social and genetic structure of Late Neolithic Switzerland
journal, April 2020

Furtwängler, Anja; Rohrlach, A. B.; Lamnidis, Thiseas C.
Nature Communications, Vol. 11, Issue 1
DOI: 10.1038/s41467-020-15560-x

SNHG7 is a lncRNA oncogene controlled by Insulin-like Growth Factor signaling through a negative feedback loop to tightly regulate proliferation
journal, May 2020

Boone, David N.; Warburton, Andrew; Som, Sreeroopa
Scientific Reports, Vol. 10, Issue 1
DOI: 10.1038/s41598-020-65109-7

Integrated genomic analyses of ovarian carcinoma
text, January 2011

Charles, Perou,
The University of North Carolina at Chapel Hill University Libraries
DOI: 10.17615/hvp3-wg08

Comprehensive molecular portraits of human breast tumours
text, January 2012

Charles, Perou,
The University of North Carolina at Chapel Hill University Libraries
DOI: 10.17615/hyeb-c392

TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions
text, January 2013

Kim, Daehwan; Pertea, Geo; Trapnell, Cole
Springer Nature
DOI: 10.13016/7jyb-2sdy

Works referencing / citing this record:

Proteogenomics from a bioinformatics angle: A growing field: PROTEOGENOMICS FROM A BIOINFORMATICS ANGLE
journal, December 2015

Menschaert, Gerben; Fenyö, David
Mass Spectrometry Reviews, Vol. 36, Issue 5
DOI: 10.1002/mas.21483

Connecting Proteomics to Next‐Generation Sequencing: Proteogenomics and Its Current Applications in Biology
journal, December 2018

Low, Teck Yew; Mohtar, M. Aiman; Ang, Mia Yang
PROTEOMICS, Vol. 19, Issue 10
DOI: 10.1002/pmic.201800235

Comprehensive analysis of human protein N-termini enables assessment of various protein forms
journal, July 2017

Yeom, Jeonghun; Ju, Shinyeong; Choi, YunJin
Scientific Reports, Vol. 7, Issue 1
DOI: 10.1038/s41598-017-06314-9

FusionPro, a Versatile Proteogenomic Tool for Identification of Novel Fusion Transcripts and Their Potential Translation Products in Cancer Cells
journal, June 2019

Kim, Chae-Yeon; Na, Keun; Park, Saeram
Molecular & Cellular Proteomics, Vol. 18, Issue 8
DOI: 10.1074/mcp.ra119.001456

Onco-proteogenomics: Multi-omics level data integration for accurate phenotype prediction
journal, August 2017

Dimitrakopoulos, Lampros; Prassas, Ioannis; Diamandis, Eleftherios P.
Critical Reviews in Clinical Laboratory Sciences, Vol. 54, Issue 6
DOI: 10.1080/10408363.2017.1384446

Origins and clinical relevance of proteoforms in pediatric malignancies
journal, February 2019

Lorentzian, Amanda; Uzozie, Anuli; Lange, Philipp F.
Expert Review of Proteomics, Vol. 16, Issue 3
DOI: 10.1080/14789450.2019.1575206

High throughput discovery of protein variants using proteomics informed by transcriptomics
journal, April 2018

Saha, Shyamasree; Matthews, David A.; Bessant, Conrad
Nucleic Acids Research, Vol. 46, Issue 10
DOI: 10.1093/nar/gky295

Proteogenomic annotation of the Chinese hamster reveals extensive novel translation events and endogenous retroviral elements
journal, November 2018

Li, Shangzhong; Cha, Seong Won; Hefner, Kelly
Journal of Proteome Research
DOI: 10.1101/468181

Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification
journal, December 2016

Li, Honglan; Joh, Yoon Sung; Kim, Hyunwoo
BMC Genomics, Vol. 17, Issue S13
DOI: 10.1186/s12864-016-3327-5

Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins
journal, October 2017

Samandi, Sondos; Roy, Annie V.; Delcourt, Vivian
eLife, Vol. 6
DOI: 10.7554/elife.27860

Comprehensive analysis of human protein N-termini enables assessment of various protein forms
journal, July 2017

Yeom, Jeonghun; Ju, Shinyeong; Choi, YunJin
Scientific Reports, Vol. 7, Issue 1
DOI: 10.1038/s41598-017-06314-9

CrossHub: a tool for multi-way analysis of The Cancer Genome Atlas (TCGA) in the context of gene expression regulation mechanisms
journal, January 2016

Krasnov, George S.; Dmitriev, Alexey A.; Melnikova, Nataliya V.
Nucleic Acids Research, Vol. 44, Issue 7
DOI: 10.1093/nar/gkv1478

Proteogenomic analysis prioritises functional single nucleotide variants in cancer samples
journal, September 2017

Ma, Shiyong; Menon, Ranjeeta; Poulos, Rebecca C.
Oncotarget, Vol. 8, Issue 56
DOI: 10.18632/oncotarget.21339

Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins
journal, October 2017

Samandi, Sondos; Roy, Annie V.; Delcourt, Vivian
eLife, Vol. 6
DOI: 10.7554/elife.27860

Similar Records in DOE PAGES and OSTI.GOV collections:

An Analysis of the Sensitivity of Proteogenomic Mapping of Somatic Mutations and Novel Splicing Events in Cancer

Journal Article Ruggles, Kelly V. ; Tang, Zuojian ; Wang, Xuya ; ... - Molecular and Cellular Proteomics

Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations and splice variants identified in cancer cells are translated. Herein we therefore describe a proteogenomic data integration tool (QUILTS) and illustrate its application to whole genome, transcriptome and global MS peptide sequence datasets generated from a pair of luminal and basal-like breast cancer patient derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS process replicates. Despite over thirty sample replicates, only about 10% of all SNV (somatic andmore »« less
https://doi.org/10.1074/mcp.M115.056226
Reinspection of a Clinical Proteomics Tumor Analysis Consortium (CPTAC) Dataset with Cloud Computing Reveals Abundant Post-Translational Modifications and Protein Sequence Variants

Journal Article Prakash, Amol ; Taylor, Lorne ; Varkey, Manu ; ... - Cancers (Basel)

The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has provided some of the most in-depth analyses of the phenotypes of human tumors ever constructed. Today, the majority of proteomic data analysis is still performed using software housed on desktop computers which limits the number of sequence variants and post-translational modifications that can be considered. The original CPTAC studies limited the search for PTMs to only samples that were chemically enriched for those modified peptides. Similarly, the only sequence variants considered were those with strong evidence at the exon or transcript level. In this multi-institutional collaborative reanalysis, we utilized unbiased protein databasesmore »« less
https://doi.org/10.3390/cancers13205034

Full Text Available
Proteogenomic characterization of human colon and rectal cancer

Journal Article Zhang, Bing ; Wang, Jing ; Wang, Xiaojing ; ... - Nature, 513(7518):382-387

We analyzed proteomes of colon and rectal tumors previously characterized by the Cancer Genome Atlas (TCGA) and performed integrated proteogenomic analyses. Protein sequence variants encoded by somatic genomic variations displayed reduced expression compared to protein variants encoded by germline variations. mRNA transcript abundance did not reliably predict protein expression differences between tumors. Proteomics identified five protein expression subtypes, two of which were associated with the TCGA "MSI/CIMP" transcriptional subtype, but had distinct mutation and methylation patterns and associated with different clinical outcomes. Although CNAs showed strong cis- and trans-effects on mRNA expression, relatively few of these extend to the proteinmore »« less
https://doi.org/10.1038/nature13438
Base changes in tumour DNA have the power to reveal the causes and evolution of cancer

Journal Article Hollstein, M. ; Alexandrov, L. B. ; Wild, C. P. ; ... - Oncogene

Next-generation sequencing (NGS) technology has demonstrated that the cancer genomes are peppered with mutations. Although most somatic tumour mutations are unlikely to have any role in the cancer process per se, the spectra of DNA sequence changes in tumour mutation catalogues have the potential to identify the mutagens, and to reveal the mutagenic processes responsible for human cancer. Very recently, a novel approach for data mining of the vast compilations of tumour NGS data succeeded in separating and precisely defining at least 30 distinct patterns of sequence change hidden in mutation databases. At least half of these mutational signatures canmore »« less
Cited by 40
https://doi.org/10.1038/onc.2016.192

Full Text Available
Integrated Proteogenomic Characterization across Major Histological Types of Pediatric Brain Cancer

Journal Article Petralia, Francesca ; Tignor, Nicole ; Reva, Boris ; ... - Cell

We report a comprehensive proteogenomic analysis, including whole genome sequencing, RNA sequencing, proteomic and phosphoproteomic profiling, of 218 tumors across seven histologic types of childhood brain cancer. Proteomic data identifies common biological themes that span histologic boundaries, suggesting that treatments used for one histologic type may be applied effectively to other tumors sharing similar proteomic features. Immune landscape characterization reveals diverse tumor microenvironments across and within diagnoses. Proteomic data further reveal functional impacts of somatic mutations and CNVs not evident in transcriptomic data alone. Kinase-substrate association and coexpression network analysis identifies important biological mechanisms of tumorigenesis. Survival analysis of highmore »« less
https://doi.org/10.1016/j.cell.2020.10.044

Similar Records

Title: Proteogenomic strategies for identification of aberrant cancer peptides using large-scale Next Generation Sequencing data

Abstract

Citation Formats

Integrated genomic analyses of ovarian carcinoma journal, June 2011

Correlation between Protein and mRNA Abundance in Yeast journal, March 1999

Correlation of mRNA and protein abundance in the developing maize leaf journal, April 2014

A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics journal, March 2011

CanProVar: a human cancer proteome variation database journal, March 2010

Protein Identification Using Customized Protein Sequence Databases Derived from RNA-Seq Data journal, December 2011

TopHat: discovering splice junctions with RNA-Seq journal, March 2009

TopHat-Fusion: an algorithm for discovery of novel fusion transcripts journal, January 2011

TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions journal, January 2013

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data journal, July 2010

A framework for variation discovery and genotyping using next-generation DNA sequencing data journal, April 2011

Proteogenomic Database Construction Driven from Large Scale RNA-seq Data journal, July 2013

Improving gene annotation using peptide mass spectrometry journal, January 2007

Discovery and revision of Arabidopsis genes by proteogenomics journal, December 2008

Proteogenomics to discover the full coding content of genomes: A computational perspective journal, October 2010

An Automated Proteogenomic Method Uses Mass Spectrometry to Reveal Novel Genes in Zea mays journal, October 2013

Novel peptide identification from tandem mass spectra using ESTs and sequence database compression journal, January 2007

The Sequence Alignment/Map format and SAMtools journal, June 2009

The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search journal, September 2010

Ensembl 2013 journal, November 2012

dbSNP: the NCBI database of genetic variation journal, January 2001

De novo derivation of proteomes from transcriptomes for transcript and protein identification journal, November 2012

Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study journal, November 2011

customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search journal, September 2013

Ancient genomes reveal social and genetic structure of Late Neolithic Switzerland journal, April 2020

SNHG7 is a lncRNA oncogene controlled by Insulin-like Growth Factor signaling through a negative feedback loop to tightly regulate proliferation journal, May 2020

Integrated genomic analyses of ovarian carcinoma text, January 2011

Comprehensive molecular portraits of human breast tumours text, January 2012

TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions text, January 2013

Proteogenomics from a bioinformatics angle: A growing field: PROTEOGENOMICS FROM A BIOINFORMATICS ANGLE journal, December 2015

Connecting Proteomics to Next‐Generation Sequencing: Proteogenomics and Its Current Applications in Biology journal, December 2018

Comprehensive analysis of human protein N-termini enables assessment of various protein forms journal, July 2017

FusionPro, a Versatile Proteogenomic Tool for Identification of Novel Fusion Transcripts and Their Potential Translation Products in Cancer Cells journal, June 2019

Onco-proteogenomics: Multi-omics level data integration for accurate phenotype prediction journal, August 2017

Origins and clinical relevance of proteoforms in pediatric malignancies journal, February 2019

High throughput discovery of protein variants using proteomics informed by transcriptomics journal, April 2018

Proteogenomic annotation of the Chinese hamster reveals extensive novel translation events and endogenous retroviral elements journal, November 2018

Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification journal, December 2016

Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins journal, October 2017

Comprehensive analysis of human protein N-termini enables assessment of various protein forms journal, July 2017

CrossHub: a tool for multi-way analysis of The Cancer Genome Atlas (TCGA) in the context of gene expression regulation mechanisms journal, January 2016

Proteogenomic analysis prioritises functional single nucleotide variants in cancer samples journal, September 2017

Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins journal, October 2017

Integrated genomic analyses of ovarian carcinoma
journal, June 2011

Correlation between Protein and mRNA Abundance in Yeast
journal, March 1999

Correlation of mRNA and protein abundance in the developing maize leaf
journal, April 2014

A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics
journal, March 2011

CanProVar: a human cancer proteome variation database
journal, March 2010

Protein Identification Using Customized Protein Sequence Databases Derived from RNA-Seq Data
journal, December 2011

TopHat: discovering splice junctions with RNA-Seq
journal, March 2009

TopHat-Fusion: an algorithm for discovery of novel fusion transcripts
journal, January 2011

TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions
journal, January 2013

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
journal, July 2010

A framework for variation discovery and genotyping using next-generation DNA sequencing data
journal, April 2011

Proteogenomic Database Construction Driven from Large Scale RNA-seq Data
journal, July 2013

Improving gene annotation using peptide mass spectrometry
journal, January 2007

Discovery and revision of Arabidopsis genes by proteogenomics
journal, December 2008

Proteogenomics to discover the full coding content of genomes: A computational perspective
journal, October 2010

An Automated Proteogenomic Method Uses Mass Spectrometry to Reveal Novel Genes in Zea mays
journal, October 2013

Novel peptide identification from tandem mass spectra using ESTs and sequence database compression
journal, January 2007

The Sequence Alignment/Map format and SAMtools
journal, June 2009

The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search
journal, September 2010

Ensembl 2013
journal, November 2012

dbSNP: the NCBI database of genetic variation
journal, January 2001

De novo derivation of proteomes from transcriptomes for transcript and protein identification
journal, November 2012

Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study
journal, November 2011

customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search
journal, September 2013

Ancient genomes reveal social and genetic structure of Late Neolithic Switzerland
journal, April 2020

SNHG7 is a lncRNA oncogene controlled by Insulin-like Growth Factor signaling through a negative feedback loop to tightly regulate proliferation
journal, May 2020

Integrated genomic analyses of ovarian carcinoma
text, January 2011

Comprehensive molecular portraits of human breast tumours
text, January 2012

TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions
text, January 2013

Proteogenomics from a bioinformatics angle: A growing field: PROTEOGENOMICS FROM A BIOINFORMATICS ANGLE
journal, December 2015

Connecting Proteomics to Next‐Generation Sequencing: Proteogenomics and Its Current Applications in Biology
journal, December 2018

Comprehensive analysis of human protein N-termini enables assessment of various protein forms
journal, July 2017

FusionPro, a Versatile Proteogenomic Tool for Identification of Novel Fusion Transcripts and Their Potential Translation Products in Cancer Cells
journal, June 2019

Onco-proteogenomics: Multi-omics level data integration for accurate phenotype prediction
journal, August 2017

Origins and clinical relevance of proteoforms in pediatric malignancies
journal, February 2019

High throughput discovery of protein variants using proteomics informed by transcriptomics
journal, April 2018

Proteogenomic annotation of the Chinese hamster reveals extensive novel translation events and endogenous retroviral elements
journal, November 2018

Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification
journal, December 2016

Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins
journal, October 2017

Comprehensive analysis of human protein N-termini enables assessment of various protein forms
journal, July 2017

CrossHub: a tool for multi-way analysis of The Cancer Genome Atlas (TCGA) in the context of gene expression regulation mechanisms
journal, January 2016

Proteogenomic analysis prioritises functional single nucleotide variants in cancer samples
journal, September 2017

Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins
journal, October 2017