skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: CasCollect: targeted assembly of CRISPR-associated operons from high-throughput sequencing data

Journal Article · · NAR Genomics and Bioinformatics
ORCiD logo [1];  [2];  [2]; ORCiD logo [3]
  1. Molecular and Microbiology, Sandia National Laboratories, Albuquerque, NM 87185, USA
  2. Computational Biology and Biophysics, Sandia National Laboratories, Albuquerque, NM 87185, USA
  3. Systems Biology, Sandia National Laboratories, Livermore, CA 94550, USA

Abstract CRISPR arrays and CRISPR-associated (Cas) proteins comprise a widespread adaptive immune system in bacteria and archaea. These systems function as a defense against exogenous parasitic mobile genetic elements that include bacteriophages, plasmids and foreign nucleic acids. With the continuous spread of antibiotic resistance, knowledge of pathogen susceptibility to bacteriophage therapy is becoming more critical. Additionally, gene-editing applications would benefit from the discovery of new cas genes with favorable properties. While next-generation sequencing has produced staggering quantities of data, transitioning from raw sequencing reads to the identification of CRISPR/Cas systems has remained challenging. This is especially true for metagenomic data, which has the highest potential for identifying novel cas genes. We report a comprehensive computational pipeline, CasCollect, for the targeted assembly and annotation of cas genes and CRISPR arrays—even isolated arrays—from raw sequencing reads. Benchmarking our targeted assembly pipeline demonstrates significantly improved timing by almost two orders of magnitude compared with conventional assembly and annotation, while retaining the ability to detect CRISPR arrays and cas genes. CasCollect is a highly versatile pipeline and can be used for targeted assembly of any specialty gene set, reconfigurable for user provided Hidden Markov Models and/or reference nucleotide sequences.

Sponsoring Organization:
USDOE
Grant/Contract Number:
NA-0003525
OSTI ID:
1657646
Journal Information:
NAR Genomics and Bioinformatics, Journal Name: NAR Genomics and Bioinformatics Vol. 2 Journal Issue: 3; ISSN 2631-9268
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (26)

An updated evolutionary classification of CRISPR–Cas systems journal September 2015
Evolution and classification of the CRISPR–Cas systems journal May 2011
Diversity and evolution of class 2 CRISPR–Cas systems journal January 2017
Molecular memory of prior infections activates the CRISPR/Cas adaptive bacterial immunity system journal January 2012
FragGeneScan: predicting genes in short and error-prone reads journal August 2010
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing journal May 2012
A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity journal June 2012
VSEARCH: a versatile open source tool for metagenomics journal January 2016
Bacteriophage’s Dualism in Therapy journal July 2019
CRISPRDetect: A flexible algorithm to define CRISPR arrays journal May 2016
Accelerated Profile HMM Searches journal October 2011
CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats journal May 2007
Fast gapped-read alignment with Bowtie 2 journal March 2012
PRICE: Software for the Targeted Assembly of Components of (Meta) Genomic Sequence Data journal March 2013
Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants journal December 2019
Classification and Nomenclature of CRISPR-Cas Systems: Where from Here? journal October 2018
Diversity, classification and evolution of CRISPR-Cas systems journal June 2017
The Sequence Read Archive journal November 2010
metaSPAdes: a new versatile metagenomic assembler journal March 2017
CRISPR Provides Acquired Resistance Against Viruses in Prokaryotes journal March 2007
BBMerge – Accurate paired shotgun read merging via overlap journal October 2017
MacSyFinder: A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems journal October 2014
CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins journal May 2018
Expanding the catalog of cas genes with metagenomes journal December 2013
The Sequence Alignment/Map format and SAMtools journal June 2009
High prevalence of Streptococcus pyogenes Cas9-reactive T cells within the adult human population journal October 2018