Structure-informed clustering for population stratification in association studies
Journal Article
·
· BMC Bioinformatics
- IBM, Yorktown Heights, NY (United States). Thomas J. Watson Research Center
- IBM, Yorktown Heights, NY (United States). Thomas J. Watson Research Center; Purdue University, West Lafayette, IN (United States)
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Purdue University, West Lafayette, IN (United States)
Background: Identifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants. Results: To overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans. Conclusions: CluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- National Science Foundation (NSF); USDOE
- Grant/Contract Number:
- AC05-00OR22725
- OSTI ID:
- 2471413
- Journal Information:
- BMC Bioinformatics, Journal Name: BMC Bioinformatics Journal Issue: 1 Vol. 24; ISSN 1471-2105
- Publisher:
- BioMed CentralCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
A genomic survey of linkage disequilibrium
A comparison of genetic map distance and linkage disequilibrium between 15 polymorphic dinucleotide repeat loci in two populations
An empiric comparison of linkage disequilibrium parameters in disease gene localizations; the myotonic dystrophy experience
Journal Article
·
Thu Sep 01 00:00:00 EDT 1994
· American Journal of Human Genetics
·
OSTI ID:133360
A comparison of genetic map distance and linkage disequilibrium between 15 polymorphic dinucleotide repeat loci in two populations
Journal Article
·
Thu Sep 01 00:00:00 EDT 1994
· American Journal of Human Genetics
·
OSTI ID:134134
An empiric comparison of linkage disequilibrium parameters in disease gene localizations; the myotonic dystrophy experience
Journal Article
·
Thu Sep 01 00:00:00 EDT 1994
· American Journal of Human Genetics
·
OSTI ID:133929