DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Similarities and differences between variants called with human reference genome HG19 or HG38

Journal Article · · BMC Bioinformatics
 [1];  [2];  [3];  [4];  [3];  [5];  [3];  [3];  [6];  [7];  [3];  [4];  [3];  [3]
  1. U.S. Food and Drug Administration, Jefferson, AR (United States); DOE/OSTI
  2. Immuneering Corporation, Cambridge, MA (United States)
  3. U.S. Food and Drug Administration, Jefferson, AR (United States)
  4. Fudan Univ., Shanghai (China)
  5. National Institutes of Health (NIH), Bethesda, MD (United States)
  6. US Army Engineer Research and Development Center, Vicksburg, MS (United States)
  7. Univ. of Southern Mississippi, Hattiesburg, MS (United States)

Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed. We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and HG38. Lastly we calculated conversion rates, analyzed discordant rates between SNVs called with HG19 or HG38, and characterized the discordant SNVs. The conversion rates from HG38 to HG19 (average 95%) were lower than the conversion rates from HG19 to HG38 (average 99%). The conversion rates varied slightly among the various calling pipelines. Around 1.5% SNVs were discordantly converted between HG19 or HG38. The conversions from HG38 to HG19 had more SNVs which failed conversion and more discordant SNVs than the opposite conversion (HG19 to HG38). Most of the discordant SNVs had low read depth, were low confidence SNVs as defined by GIAB, and/or were predominated by G/C alleles (52% observed versus 42% expected). A significant number of SNVs could not be converted between HG19 and HG38. Based on careful review of our comparisons, we recommend HG38 (the newer version) for NGS SNV analysis. To summarize, our findings suggest caution when translating identified SNVs between different versions of the human reference genome.

Research Organization:
Oak Ridge Associated Univ., Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
SC0014664
OSTI ID:
1626774
Journal Information:
BMC Bioinformatics, Journal Name: BMC Bioinformatics Journal Issue: S2 Vol. 20; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (61)

The Precision Medicine Initiative: A New National Effort journal June 2015
Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine journal February 2013
Initial sequencing and analysis of the human genome journal February 2001
Building the foundation for genomics in precision medicine journal October 2015
Performance comparison of exome DNA sequencing technologies journal September 2011
Performance comparison of whole-genome sequencing platforms journal December 2011
A framework for variation discovery and genotyping using next-generation DNA sequencing data journal April 2011
Precision medicine for cancer with next-generation functional diagnostics journal November 2015
Genome-wide genetic marker discovery and genotyping using next-generation sequencing journal June 2011
DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer journal September 2022
Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens journal March 2021
Quality control metrics improve repeatability and reproducibility of single-nucleotide variants derived from whole-genome sequencing journal November 2014
A review of bioinformatic pipeline frameworks journal March 2016
RNAEditor: easy detection of RNA editing events and the introduction of editing islands journal September 2016
The variant call format and VCFtools journal June 2011
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data journal September 2011
Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms journal June 2013
CrossMap: a versatile tool for coordinate conversion between genome assemblies journal December 2013
The UCSC Genome Browser database: 2018 update journal November 2017
The Sequence of the Human Genome journal February 2001
Analytical validation of whole exome and whole genome sequencing for clinical applications journal April 2014
Technical Reproducibility of Genotyping SNP Arrays Used in Genome-Wide Association Studies journal September 2012
Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies journal June 2013
Discovery of Protein–lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets journal January 2015
Clinical applications of next generation sequencing in cancer: from panels, to exomes, to genomes journal June 2015
Review of Current Methods, Applications, and Data Management for the Bioinformatics Analysis of Whole Exome Sequencing journal January 2014
Integrating sequencing datasets to form highly confident SNP and indel genotype calls for a whole human genome text January 2013
The Precision Medicine Initiative: A New National Effort journal June 2015
From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline journal October 2013
Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine journal February 2013
Next-Generation Sequencing of Circulating Tumor DNA for Early Cancer Detection journal February 2017
Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT) journal May 2015
Initial sequencing and analysis of the human genome journal February 2001
Building the foundation for genomics in precision medicine journal October 2015
Performance comparison of exome DNA sequencing technologies journal September 2011
Performance comparison of whole-genome sequencing platforms journal December 2011
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls journal February 2014
A framework for variation discovery and genotyping using next-generation DNA sequencing data journal April 2011
Fast gapped-read alignment with Bowtie 2 journal March 2012
Precision medicine for cancer with next-generation functional diagnostics journal November 2015
Genome-wide genetic marker discovery and genotyping using next-generation sequencing journal June 2011
Extensive sequencing of seven human genomes to characterize benchmark reference materials journal June 2016
Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions journal August 2011
Quality control metrics improve repeatability and reproducibility of single-nucleotide variants derived from whole-genome sequencing journal November 2014
The Sequence Alignment/Map format and SAMtools journal June 2009
BamTools: a C++ API and toolkit for analyzing and managing BAM files journal April 2011
The variant call format and VCFtools journal June 2011
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data journal September 2011
Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms journal June 2013
CrossMap: a versatile tool for coordinate conversion between genome assemblies journal December 2013
The UCSC Genome Browser database: 2018 update journal November 2017
Scaling accurate genetic variant discovery to tens of thousands of samples posted_content January 2018
Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly journal April 2017
Genome graphs and the evolution of genome inference journal March 2017
The Sequence of the Human Genome journal February 2001
Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing journal January 2013
Technical Reproducibility of Genotyping SNP Arrays Used in Genome-Wide Association Studies journal September 2012
Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies journal June 2013
Clinical applications of next generation sequencing in cancer: from panels, to exomes, to genomes journal June 2015
Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine journal November 2015
Similarities and differences between variants called with human reference genome HG19 or HG38 collection January 2019