Reinspection of a Clinical Proteomics Tumor Analysis Consortium (CPTAC) Dataset with Cloud Computing Reveals Abundant Post-Translational Modifications and Protein Sequence Variants
- Optys Tech Corporation, Shrewsbury, MA (United States)
- McGill University Health Center, Montreal (Canada)
- Aptagen LLC, Philadelphia, PA (United States)
- Leiden University Medical Center (The Netherlands)
- Northwestern University, Chicago, IL (United States)
- Thermo Fisher Scientific, Grimes, IA (United States)
- Johns Hopkins University, Baltimore, MD (United States)
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Monash University, Melbourne (Australia)
- Indian Institute of Technology, Mumbai (India)
- University of Texas Southwestern Medical Center, Dallas, TX (United States); Eastern Virginia Medical School, Norfolk, VA (United States)
- Eastern Virginia Medical School, Norfolk, VA (United States)
- University of Toronto (Canada)
- Mayo Clinic, Rochester, MN (United States)
- University of Virginia, Charlottesville, VA (United States)
- Deurion LLC, Ellicott City, MD (United States)
The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has provided some of the most in-depth analyses of the phenotypes of human tumors ever constructed. Today, the majority of proteomic data analysis is still performed using software housed on desktop computers which limits the number of sequence variants and post-translational modifications that can be considered. The original CPTAC studies limited the search for PTMs to only samples that were chemically enriched for those modified peptides. Similarly, the only sequence variants considered were those with strong evidence at the exon or transcript level. In this multi-institutional collaborative reanalysis, we utilized unbiased protein databases containing millions of human sequence variants in conjunction with hundreds of common post-translational modifications. Using these tools, we identified tens of thousands of high-confidence PTMs and sequence variants. We identified 4132 phosphorylated peptides in nonenriched samples, 93% of which were confirmed in the samples which were chemically enriched for phosphopeptides. In addition, our results also cover 90% of the high-confidence variants reported by the original proteogenomics study, without the need for sample specific next-generation sequencing. Finally, we report fivefold more somatic and germline variants that have an independent evidence at the peptide level, including mutations in ERRB2 and BCAS1. In this reanalysis of CPTAC proteomic data with cloud computing, we present an openly available and searchable web resource of the highest-coverage proteomic profiling of human tumors described to date.
- Research Organization:
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- National Cancer Institute (NCI)
- Grant/Contract Number:
- 89233218CNA000001
- OSTI ID:
- 1829657
- Report Number(s):
- LA-UR-21-25931
- Journal Information:
- Cancers (Basel), Vol. 13, Issue 20; ISSN 2072-6694
- Publisher:
- MDPICopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
An Analysis of the Sensitivity of Proteogenomic Mapping of Somatic Mutations and Novel Splicing Events in Cancer
Tandem Mass Tag Labeling Facilitates Reversed-Phase Liquid Chromatography-Mass Spectrometry Analysis of Hydrophilic Phosphopeptides