DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Conserved unique peptide patterns (CUPP) online platform 2.0: implementation of +1000 JGI fungal genomes

Journal Article · · Nucleic Acids Research
ORCiD logo [1];  [2];  [3]; ORCiD logo [4]; ORCiD logo [2]
  1. Technical University of Denmark, Lyngby (Denmark); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). USDOE Joint BioEnergy Institute
  2. Technical University of Denmark, Lyngby (Denmark)
  3. BioEconomy, Valby (Denmark)
  4. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). USDOE Joint BioEnergy Institute; University of California, Berkeley, CA (United States)

Carbohydrate-processing enzymes, CAZymes, are classified into families based on sequence and three-dimensional fold. Because many CAZyme families contain members of diverse molecular function (different EC-numbers), sophisticated tools are required to further delineate these enzymes. Such delineation is provided by the peptide-based clustering method CUPP, Conserved Unique Peptide Patterns. CUPP operates synergistically with the CAZy family/subfamily categorizations to allow systematic exploration of CAZymes by defining small protein groups with shared sequence motifs. The updated CUPP library contains 21,930 of such motif groups including 3,842,628 proteins. The new implementation of the CUPP-webserver, https://cupp.info/, now includes all published fungal and algal genomes from the Joint Genome Institute (JGI), genome resources MycoCosm and PhycoCosm, dynamically subdivided into motif groups of CAZymes. This allows users to browse the JGI portals for specific predicted functions or specific protein families from genome sequences. Thus, a genome can be searched for proteins having specific characteristics. All JGI proteins have a hyperlink to a summary page which links to the predicted gene splicing including which regions have RNA support. The new CUPP implementation also includes an update of the annotation algorithm that uses only a fourth of the RAM while enabling multi-threading, providing an annotation speed below 1 ms/protein.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER); USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF); Novo Nordisk Foundation
Grant/Contract Number:
AC02-05CH11231; NNF21OC0066330; NNF22OC0072911
OSTI ID:
2234073
Journal Information:
Nucleic Acids Research, Vol. 51, Issue W1; ISSN 0305-1048
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English

References (25)

101 Dothideomycetes genomes: A test case for predicting lifestyles and emergence of pathogens journal June 2020
eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses journal November 2018
The carbohydrate-active enzymes database (CAZy) in 2013 journal November 2013
dbCAN2: a meta server for automated carbohydrate-active enzyme annotation journal May 2018
Database resources of the National Center for Biotechnology Information in 2023 journal November 2022
SignalP 6.0 predicts all five types of signal peptides using protein language models journal January 2022
UniProt: the Universal Protein Knowledgebase in 2023 journal November 2022
Gene family expansions and transcriptome signatures uncover fungal adaptations to wood decay journal February 2021
SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets journal February 2018
Discovery of novel secretome CAZymes from Penicillium sclerotigenum by bioinformatics and explorative proteomics analyses during sweet potato pectin digestion journal September 2022
Conserved unique peptide patterns (CUPP) online platform: peptide-based functional annotation of carbohydrate active enzymes journal May 2020
The carbohydrate-active enzyme database: functions and literature journal November 2021
InterPro in 2022 journal November 2022
PhycoCosm, a comparative algal genomics resource journal October 2020
A classification of glycosyl hydrolases based on amino acid sequence similarities journal December 1991
CD-HIT: accelerated for clustering the next-generation sequencing data journal October 2012
Genomic characterization of three marine fungi, including Emericellopsis atlantica sp. nov. with signatures of a generalist lifestyle and marine biomass degradation journal August 2021
Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP) journal April 2019
eggNOG 6.0: enabling comparative genomics across 12 535 organisms journal November 2022
Fast and sensitive protein alignment using DIAMOND journal November 2014
Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions journal April 2013
A parts list for fungal cellulosomes revealed by comparative genomics journal May 2017
Widespread adenine N6-methylation of active genes in fungi journal May 2017
MycoCosm portal: gearing up for 1000 fungal genomes journal December 2013
Pfam: The protein families database in 2021 journal October 2020