skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Rapid evaluation and quality control of next generation sequencing data with FaQCs

Journal Article · · BMC Bioinformatics
 [1];  [1]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC52-06NA25396
OSTI ID:
1200616
Report Number(s):
LA-UR-13-20812
Journal Information:
BMC Bioinformatics, Vol. 15, Issue 1; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 128 works
Citation information provided by
Web of Science

References (18)

Substantial biases in ultra-short read data sets from high-throughput DNA sequencing journal August 2008
SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data journal September 2010
Quality control and preprocessing of metagenomic datasets journal January 2011
Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences journal January 2010
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers journal January 2011
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants journal December 2009
Fast and accurate short read alignment with Burrows-Wheeler transform journal May 2009
Velvet: Algorithms for de novo short read assembly using de Bruijn graphs journal February 2008
Short read fragment assembly of bacterial genomes journal February 2008
IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth journal April 2012
A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays journal April 2009
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data journal September 2011
Two Sides of a Coin: a Zika Virus Mutation Selected in Pregnant Rhesus Macaques Promotes Fetal Infection in Mice but at a Cost of Reduced Fitness in Nonpregnant Macaques and Diminished Transmissibility by Vectors journal September 2020
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. text January 2008
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data text January 2012
Galaxy: A Web‐Based Genome Analysis Tool for Experimentalists journal January 2010
Targeted A-to-G base editing of chloroplast DNA in plants journal December 2022
In-depth analysis of interrelation between quality scores and real errors in illumina reads conference July 2013

Cited By (44)

Guidelines for RNA-seq projects: applications and opportunities in non-model decapod crustacean species journal July 2018
Metagenomic analysis reveals the prevalence and persistence of antibiotic- and heavy metal-resistance genes in wastewater treatment plant journal June 2018
Next-generation sequencing analysis of multiplex families with atypical psychosis journal October 2018
No metagenomic evidence of tumorigenic viruses in cancers from a selected cohort of immunosuppressed subjects journal December 2019
Standardized phylogenetic and molecular evolutionary analysis applied to species across the microbial tree of life journal February 2020
Development of oligonucleotide-based antagonists of Ebola virus protein 24 inhibiting its interaction with karyopherin alpha 1 journal January 2018
The histone modification H3 lysine 27 tri-methylation has conserved gene regulatory roles in the triplicated genome of Brassica rapa L. journal October 2019
Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform journal November 2016
KAT: A K-mer Analysis Toolkit to quality control NGS datasets and genome assemblies posted_content October 2016
Oncogenic memory underlying minimal residual disease in breast cancer preprint January 2020
Mechanical incompatibility caused by modifications of multiple male genital structures using genomic introgression in Drosophila * : GENITAL INCOMPATIBILITY BY INTROGRESSION journal September 2018
Candidatus Nitrosotenuis aquarius,” an Ammonia-Oxidizing Archaeon from a Freshwater Aquarium Biofilter journal June 2018
Draft Genome Sequence of Sorghum Grain Mold Fungus Epicoccum sorghinum, a Producer of Tenuazonic Acid journal January 2017
Draft Genome Sequences of Two Staphylococcus warneri Clinical Isolates, Strains SMA0023-04 (UGA3) and SMA0670-05 (UGA28), from Siaya County Referral Hospital, Siaya, Kenya journal April 2019
Genome Sequence of a Staphylococcus xylosus Clinical Isolate, Strain SMA0341-04 (UGA5), from Siaya County Referral Hospital in Siaya, Kenya journal April 2019
Genome Sequence of Staphylococcus pettenkoferi Strain SMA0010-04 (UGA20), a Clinical Isolate from Siaya County Referral Hospital in Siaya, Kenya journal April 2019
Genome Sequences of a Staphylococcus aureus Clinical Isolate, Strain SMA0034-04 (UGA22), from Siaya County Referral Hospital in Siaya, Kenya journal April 2019
ADEPT, a dynamic next generation sequencing data error-detection program with trimming journal February 2016
FastProNGS: fast preprocessing of next-generation sequencing reads journal June 2019
FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics journal August 2019
A binning tool to reconstruct viral haplotypes from assembled contigs journal November 2019
Entropy of mitochondrial DNA circulating in blood is associated with hepatocellular carcinoma journal June 2019
Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments journal March 2019
Mobile resistome of human gut and pathogen drives anthropogenic bloom of antibiotic resistance journal January 2020
Freshwater viral metagenome reveals novel and functional phage-borne antibiotic resistance genes journal June 2020
Effects of cadmium perturbation on the microbial community structure and heavy metal resistome of a tropical agricultural soil journal May 2020
Transcriptomics technologies journal May 2017
TGFβ signaling related genes are involved in hormonal mediation during termite soldier differentiation journal April 2018
Using targeted next-generation sequencing to characterize genetic differences associated with insecticide resistance in Culex quinquefasciatus populations from the southern U.S. journal July 2019
Streptococcus mutans Displays Altered Stress Responses While Enhancing Biofilm Formation by Lactobacillus casei in Mixed-Species Consortium journal December 2017
Advances and Challenges in Metatranscriptomic Analysis journal September 2019
Detection of Abrin-Like and Prepropulchellin-Like Toxin Genes and Transcripts Using Whole Genome Sequencing and Full-Length Transcript Sequencing of Abrus precatorius journal November 2019
Characterizing Phage Genomes for Therapeutic Applications journal April 2018
KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies journal October 2016
Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments posted_content January 2019
Transcriptomics technologies text January 2021
A binning tool to reconstruct viral haplotypes from assembled contigs journal July 2019
Freshwater Viral Metagenome Reveals Novel and Functional Phage-borne Antibiotic Resistance Genes journal June 2020
Parallel and Gradual Genome Erosion in the Blattabacterium Endosymbionts of Mastotermes darwiniensis and Cryptocercus Wood Roaches journal June 2018
Selective sets of mRNAs localize to extracellular paramural bodies in a rice glup6 mutant journal August 2018
Novel bioinformatics quality control metric for next-generation sequencing experiments in the clinical context journal September 2019
Association of colitis with gut-microbiota dysbiosis in clathrin adapter AP-1B knockout mice journal March 2020
Identification of a master transcription factor and a regulatory mechanism for desiccation tolerance in the anhydrobiotic cell line Pv11 journal March 2020
Constructing and Characterizing Bacteriophage Libraries for Phage Therapy of Human Infections journal November 2019