DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Overestimated prediction using polygenic prediction derived from summary statistics

Journal Article · · BMC Genomic Data (Online)
 [1];  [2];  [1];  [3];  [4];  [5];  [6];  [7];  [5]
  1. Columbia Univ., New York, NY (United States)
  2. Stony Brook Univ., NY (United States)
  3. Sungkyunkwan University, Seoul (South Korea). Samsung Medical Center, Samsung Advanced Institute for Health Sciences & Technology (SAHIST)
  4. California Institute of Technology (CalTech), Pasadena, CA (United States)
  5. National Health Insurance Service Ilsan Hospital, Goyang (South Korea). Dementia Center, Department of Physical Medicine and Rehabilitation
  6. Seoul National Univ. (Korea, Republic of). Brain and Cognitive Sciences, AI Institute
  7. Brookhaven National Laboratory (BNL), Upton, NY (United States)

When polygenic risk score (PRS) is derived from summary statistics, independence between discovery and test sets cannot be monitored. We compared two types of PRS studies derived from raw genetic data (denoted as rPRS) and the summary statistics for IGAP (sPRS). Two variables with the high heritability in UK Biobank, hypertension, and height, are used to derive an exemplary scale effect of PRS. sPRS without APOE is derived from International Genomics of Alzheimer’s Project (IGAP), which records ΔAUC and ΔR2 of 0.051 ± 0.013 and 0.063 ± 0.015 for Alzheimer’s Disease Sequencing Project (ADSP) and 0.060 and 0.086 for Accelerating Medicine Partnership - Alzheimer’s Disease (AMP-AD). On UK Biobank, rPRS performances for hypertension assuming a similar size of discovery and test sets are 0.0036 ± 0.0027 (ΔAUC) and 0.0032 ± 0.0028 (ΔR2). For height, ΔR2 is 0.029 ± 0.0037. Considering the high heritability of hypertension and height of UK Biobank and sample size of UK Biobank, sPRS results from AD databases are inflated. Independence between discovery and test sets is a well-known basic requirement for PRS studies. However, a lot of PRS studies cannot follow such requirements because of impossible direct comparisons when using summary statistics. Thus, for sPRS, potential duplications should be carefully considered within the same ethnic group.

Research Organization:
Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research
Grant/Contract Number:
SC0012704
OSTI ID:
2006815
Report Number(s):
BNL-224815-2023-JAAM
Journal Information:
BMC Genomic Data (Online), Vol. 24, Issue 1; ISSN 2730-6844
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (46)

GCTA: A Tool for Genome-wide Complex Trait Analysis journal January 2011
A multi-omic atlas of the human frontal cortex for aging and Alzheimer’s disease research journal August 2018
Second-generation PLINK: rising to the challenge of larger and richer datasets journal February 2015
Improving reporting standards for polygenic scores in risk prediction studies journal March 2021
The Mount Sinai cohort of large-scale genomic, transcriptomic and proteomic data in Alzheimer's disease journal September 2018
Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements journal November 2020
A note on a general definition of the coefficient of determination journal January 1991
Common polygenic variation enhances risk prediction for Alzheimer’s disease journal October 2015
Cross-cancer evaluation of polygenic risk scores for 16 cancer types in two large cohorts journal February 2021
Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia journal March 2016
From Polygenic Scores to Precision Medicine in Alzheimer’s Disease: A Systematic Review journal April 2020
Predictive Accuracy of a Polygenic Risk Score–Enhanced Prediction Model vs a Clinical Risk Score for Coronary Artery Disease journal February 2020
The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation journal March 2021
EraSOR: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses journal December 2022
The UK Biobank resource with deep phenotyping and genomic data journal October 2018
Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers journal April 2020
Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach journal September 1988
Validation of a polygenic risk score for dementia in black and white individuals journal July 2014
Principles for the post-GWAS functional characterization of cancer risk loci journal May 2011
Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations journal August 2018
Development and Evaluation of a Genetic Risk Score for Obesity journal January 2013
The AD Knowledge Portal: A Repository for Multi‐Omic Data on Alzheimer's Disease and Aging journal October 2020
PRSice-2: Polygenic Risk Score software for biobank-scale data journal July 2019
Population-specific and trans-ancestry genome-wide analyses identify distinct and shared genetic risk loci for coronary artery disease journal October 2020
Alzheimer's Disease Sequencing Project discovery and replication criteria for cases and controls: Data from a community‐based prospective cohort study with autopsy follow‐up journal October 2017
Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model journal May 2020
Polygenic scores via penalized regression on summary statistics journal May 2017
Polygenic Scores for Height in Admixed Populations journal November 2020
Assessment of Claims of Improved Prediction Beyond the Framingham Risk Score journal December 2009
Clinical use of current polygenic risk scores may exacerbate health disparities journal March 2019
Analysis of polygenic risk score usage and performance in diverse human populations journal July 2019
Polygenic Risk Predicts Obesity in Both White and Black Young Adults journal July 2014
PRSice: Polygenic Risk Score software journal December 2014
The Alzheimer's Disease Sequencing Project: Study design and sample selection journal October 2017
Genetic Determinants of “Cognitive Impairment, No Dementia” journal January 2013
pROC: an open-source package for R and S+ to analyze and compare ROC curves journal March 2011
The multiplex model of the genetics of Alzheimer’s disease journal February 2020
Human whole genome genotype and transcriptome data for Alzheimer’s and other neurodegenerative diseases journal October 2016
Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease journal October 2013
Heterogeneity in polygenic scores for common human traits posted_content February 2017
An atlas of genetic associations in UK Biobank journal October 2018
An Alzheimer’s Disease Genetic Risk Score Predicts Longitudinal Thinning of Hippocampal Complex Subregions in Healthy Older Adults journal May 2016
Power and Predictive Accuracy of Polygenic Risk Scores journal March 2013
A Type 1 Diabetes Genetic Risk Score Can Aid Discrimination Between Type 1 and Type 2 Diabetes in Young Adults journal November 2015
Predictive Utility of Polygenic Risk Scores for Coronary Heart Disease in Three Major Racial and Ethnic Groups journal May 2020
Tutorial: a guide to performing polygenic risk score analyses journal July 2020