Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

EDGE COVID-19: a web platform to generate submission-ready genomes from SARS-CoV-2 sequencing efforts

Journal Article · · Bioinformatics
Abstract Summary

Genomics has become an essential technology for surveilling emerging infectious disease outbreaks. A range of technologies and strategies for pathogen genome enrichment and sequencing are being used by laboratories worldwide, together with different and sometimes ad hoc, analytical procedures for generating genome sequences. A fully integrated analytical process for raw sequence to consensus genome determination, suited to outbreaks such as the ongoing COVID-19 pandemic, is critical to provide a solid genomic basis for epidemiological analyses and well-informed decision making. We have developed a web-based platform and integrated bioinformatic workflows that help to provide consistent high-quality analysis of SARS-CoV-2 sequencing data generated with either the Illumina or Oxford Nanopore Technologies (ONT). Using an intuitive web-based interface, this workflow automates data quality control, SARS-CoV-2 reference-based genome variant and consensus calling, lineage determination and provides the ability to submit the consensus sequence and necessary metadata to GenBank, GISAID and INSDC raw data repositories. We tested workflow usability using real world data and validated the accuracy of variant and lineage analysis using several test datasets, and further performed detailed comparisons with results from the COVID-19 Galaxy Project workflow. Our analyses indicate that EC-19 workflows generate high-quality SARS-CoV-2 genomes. Finally, we share a perspective on patterns and impact observed with Illumina versus ONT technologies on workflow congruence and differences.

Availability and implementation

https://edge-covid19.edgebioinformatics.org, and https://github.com/LANL-Bioinformatics/EDGE/tree/SARS-CoV2.

Supplementary information

Supplementary data are available at Bioinformatics online.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
Defense Threat Reduction Agency (DTRA); National Science Foundation (NSF); USDOE; USDOE Laboratory Directed Research and Development (LDRD) Program
Grant/Contract Number:
89233218CNA000001
OSTI ID:
1868101
Alternate ID(s):
OSTI ID: 1861895
OSTI ID: 2470536
OSTI ID: 2477218
Report Number(s):
LA-UR--20-24274
Journal Information:
Bioinformatics, Journal Name: Bioinformatics Journal Issue: 10 Vol. 38; ISSN 1367-4803
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (11)

Mapping and phasing of structural variation in patient genomes using nanopore sequencing journal November 2017
Ready-to-use public infrastructure for global SARS-CoV-2 monitoring journal September 2021
Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic journal May 2021
Minimap2: pairwise alignment for nucleotide sequences journal May 2018
GenBank journal November 2015
Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform journal November 2016
Assignment of Epidemiological Lineages in an Emerging Pandemic Using the Pangolin Tool journal July 2021
Rapid evaluation and quality control of next generation sequencing data with FaQCs journal November 2014
JBrowse: a dynamic web platform for genome visualization and analysis journal April 2016
From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy journal July 2018
GISAID: Global initiative on sharing all influenza data – from vision to reality journal March 2017