HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly
Abstract Background Pacific Biosciences HiFi read technology is currently the industry standard for high accuracy long-read sequencing that has been widely adopted by large sequencing and assembly initiatives for generation of de novo assemblies in non-model organisms. Though adapter contamination filtering is routine in traditional short-read analysis pipelines, it has not been widely adopted for HiFi workflows. Results Analysis of 55 publicly available HiFi datasets revealed that a read-sanitation step to remove sequence artifacts derived from PacBio library preparation from read pools is necessary as adapter sequences can be erroneously integrated into assemblies. Conclusions Here we describe the nature of adapter contaminated reads, their consequences in assembly, and present HiFiAdapterFilt, a simple and memory efficient solution for removing adapter contaminated reads prior to assembly.
- Sponsoring Organization:
- USDOE
- OSTI ID:
- 1846047
- Journal Information:
- BMC Genomics, Journal Name: BMC Genomics Journal Issue: 1 Vol. 23; ISSN 1471-2164
- Publisher:
- Springer Science + Business MediaCopyright Statement
- Country of Publication:
- United Kingdom
- Language:
- English
Similar Records
polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies
Long-read, whole-genome shotgun sequence data for five model organisms