DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly

Journal Article · · BMC Genomics

Abstract Background Pacific Biosciences HiFi read technology is currently the industry standard for high accuracy long-read sequencing that has been widely adopted by large sequencing and assembly initiatives for generation of de novo assemblies in non-model organisms. Though adapter contamination filtering is routine in traditional short-read analysis pipelines, it has not been widely adopted for HiFi workflows. Results Analysis of 55 publicly available HiFi datasets revealed that a read-sanitation step to remove sequence artifacts derived from PacBio library preparation from read pools is necessary as adapter sequences can be erroneously integrated into assemblies. Conclusions Here we describe the nature of adapter contaminated reads, their consequences in assembly, and present HiFiAdapterFilt, a simple and memory efficient solution for removing adapter contaminated reads prior to assembly.

Sponsoring Organization:
USDOE
OSTI ID:
1846047
Journal Information:
BMC Genomics, Journal Name: BMC Genomics Journal Issue: 1 Vol. 23; ISSN 1471-2164
Publisher:
Springer Science + Business MediaCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (18)

Effects of short read quality and quantity on a de novo vertebrate transcriptome assembly journal January 2012
Telomere-to-telomere assembly of a complete human X chromosome journal July 2020
Towards complete and error-free genome assemblies of all vertebrate species journal April 2021
Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome journal August 2019
Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm journal February 2021
The Earth BioGenome Project 2020: Starting the clock journal January 2022
Identifying and removing haplotypic duplication in primary genome assemblies journal January 2020
BamTools: a C++ API and toolkit for analyzing and managing BAM files journal April 2011
Trimmomatic: a flexible trimmer for Illumina sequence data journal April 2014
HaploMerger2: rebuilding both haploid sub-assemblies from high-heterozygosity diploid genome assembly journal April 2017
VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening journal October 2017
HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads journal August 2020
The Earth BioGenome project: opportunities and challenges for plant genomics and conservation journal January 2020
BLAST+: architecture and applications journal January 2009
Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies journal November 2018
MUMmer4: A fast and versatile genome alignment system journal January 2018
An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis journal December 2013
The USDA-ARS Ag100Pest Initiative: High-Quality Genome Assemblies for Agricultural Pest Arthropod Research journal July 2021

Related Subjects