Inference of Chromosome-Length Haplotypes Using Genomic Data of Three or a Few More Single Gametes
- Univ. of California, Riverside, CA (United States). Dept. of Botany and Plant Sciences; Univ. of California, Riverside, CA (United States). Graduate Program in Genetics, Genomics, and Bioinformatics
- Univ. of California, Riverside, CA (United States). Dept. of Botany and Plant Sciences
- Univ. of California, Riverside, CA (United States). Graduate Program in Genetics, Genomics, and Bioinformatics
- Univ. of California, Riverside, CA (United States). Dept. of Botany and Plant Sciences; Wayne State Univ., Detroit, MI (United States). Center for Molecular Medicine and Genetics
- Huazhong Agricultural Univ., Wuhan (China). College of Plant Science and Technology, Statistical Genomics Lab
- Yangzhou Univ. (China)
- South China Univ. of Technology, Guangzhou (China)
- Guizhou Provincial People’s Hospital, Guizhou (China). Dept. of Urology
- Univ. of California, Riverside, CA (United States). Dept. of Botany and Plant Sciences; South China Univ. of Technology, Guangzhou (China)
- Univ. of California, Riverside, CA (United States). Dept. of Botany and Plant Sciences; Bowdoin College, Brunswick, ME (United States). Dept. of Mathematics
Compared with genomic data of individual markers, haplotype data provide higher resolution for DNA variants, advancing our knowledge in genetics and evolution. Although many computational and experimental phasing methods have been developed for analyzing diploid genomes, it remains challenging to reconstruct chromosome-scale haplotypes at low cost, which constrains the utility of this valuable genetic resource. Gamete cells, the natural packaging of haploid complements, are ideal materials for phasing entire chromosomes because the majority of the haplotypic allele combinations has been preserved. Therefore, compared with the current diploid-based phasing methods, using haploid genomic data of single gametes may substantially reduce the complexity in inferring the donor’s chromosomal haplotypes. In this study, we developed the first easy-to-use R package, Hapi, for inferring chromosome-length haplotypes of individual diploid genomes with only a few gametes. Hapi outperformed other phasing methods when analyzing both simulated and real single gamete cell sequencing data sets. The results also suggested that chromosome-scale haplotypes may be inferred by using as few as three gametes, which has pushed the boundary to its possible limit. The single gamete cell sequencing technology allied with the cost-effective Hapi method will make large-scale haplotype-based genetic studies feasible and affordable, promoting the use of haplotype data in a wide range of research.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC); National Institute of Food and Agriculture (NIFA); National Key Basic Research Program of China; National Natural Science Foundation of China (NSFC)
- Grant/Contract Number:
- AC02-05CH11231; 2013-67013-21110; 2015CB553706; 81571427; 201803040001; 201707010291; 81660426; 2017-5803
- OSTI ID:
- 1816248
- Journal Information:
- Molecular Biology and Evolution, Vol. 37, Issue 12; ISSN 0737-4038
- Publisher:
- Oxford University PressCopyright Statement
- Country of Publication:
- United States
- Language:
- English