A Comparison of Ten Polygenic Score Methods for Psychiatric Disorders Applied Across Multiple Cohorts

Published on May 4, 2021in Biological Psychiatry12.095
· DOI :10.1016/J.BIOPSYCH.2021.04.018
Peter M. Visscher146
Estimated H-index: 146
(UQ: University of Queensland),
Naomi R. Wray102
Estimated H-index: 102
(UQ: University of Queensland)
Abstract Background Polygenic scores (PGSs), which assess the genetic risk of individuals for a disease, are calculated as a weighted count of risk alleles identified in genome-wide association studies (GWASs). PGS methods differ in which DNA variants are included and the weights assigned to them; some require an independent tuning sample to help inform these choices. PGSs are evaluated in independent target cohorts with known disease status. Variability between target cohorts is observed in applications to real data sets, which could reflect a number of factors, e.g., phenotype definition or technical factors. Methods The Psychiatric Genomics Consortium working groups for schizophrenia (SCZ) and major depressive disorder (MDD) bring together many independently collected case-control cohorts. We used these resources (31K SCZ cases, 41K controls; 248K MDD cases, 563K controls) in repeated application of leave-one-cohort-out meta-analyses, each used to calculate and evaluate PGS in the left-out (target) cohort. Ten PGS methods (the baseline PC+T method and nine methods that model genetic architecture more formally: SBLUP, LDpred2-Inf, LDpred-funct, LDpred2, Lassosum, PRS-CS, PRS-CS-auto, SBayesR, MegaPRS) are compared. Results Compared to PC+T, the other nine methods give higher prediction statistics, MegaPRS, LDPred2 and SBayesR significantly so, up to 9.2% variance in liability for SCZ across 30 target cohorts, an increase of 44%. For MDD across 26 target cohorts these statistics were 3.5% and 59%, respectively. Conclusions Although the methods that more formally model genetic architecture have similar performance, MegaPRS, LDpred2, and SBayesR rank highest in most comparison and are recommended in applications to psychiatric disorders.
📖 Papers frequently viewed together
13 Authors (Guiyan Ni, ..., Naomi R. Wray)
14 Citations
2 Citations
35 Citations
#1Restuadi Restuadi (UQ: University of Queensland)H-Index: 5
#2Fleur C. Garton (UQ: University of Queensland)H-Index: 16
Last. Allan F. McRae (UQ: University of Queensland)H-Index: 54
view all 26 authors...
Amyotrophic Lateral Sclerosis (ALS) is recognised to be a complex neurodegenerative disease involving both genetic and non-genetic risk factors. The underlying causes and risk factors for the majority of cases remain unknown; however, ever-larger genetic data studies and methodologies promise an enhanced understanding. Recent analyses using published summary statistics from the largest ALS genome-wide association study (GWAS) (20,806 ALS cases and 59,804 healthy controls) identified that schizop...
3 CitationsSource
#1Florian Privé (AU: Aarhus University)H-Index: 9
#2Julyan Arbel (IRIA: French Institute for Research in Computer Science and Automation)H-Index: 8
Last. Bjarni J. Vilhjálmsson (AU: Aarhus University)H-Index: 25
view all 3 authors...
MOTIVATION Polygenic scores have become a central tool in human genetics research. LDpred is a popular method for deriving polygenic scores based on summary statistics and a matrix of correlation between genetic variants. However, LDpred has limitations that may reduce its predictive performance. RESULTS Here we present LDpred2, a new version of LDpred that addresses these issues. We also provide two new options in LDpred2: a "sparse" option that can learn effects that are exactly 0, and an "aut...
32 CitationsSource
#1Graham K. Murray (University of Cambridge)H-Index: 43
#2Tian Lin (UQ: University of Queensland)H-Index: 11
Last. Naomi R. Wray (UQ: University of Queensland)H-Index: 102
view all 6 authors...
Importance Polygenic risk scores (PRS) are predictors of the genetic susceptibility to diseases, calculated for individuals as weighted counts of thousands of risk variants in which the risk variants and their weights have been identified in genome-wide association studies. Polygenic risk scores show promise in aiding clinical decision-making in many areas of medical practice. This review evaluates the potential use of PRS in psychiatry. Observations On their own, PRS will never be able to estab...
32 CitationsSource
#1Naomi R. Wray (UQ: University of Queensland)H-Index: 102
#2Tian Lin (UQ: University of Queensland)H-Index: 11
Last. Peter M. Visscher (UQ: University of Queensland)H-Index: 146
view all 7 authors...
Importance Polygenic risk scores (PRS) are predictors of the genetic susceptibilities of individuals to diseases. All individuals have DNA risk variants for all common diseases, but genetic susceptibility differences between people reflect the cumulative burden of these. Polygenic risk scores for an individual are calculated as weighted counts of thousands of risk variants that they carry, where the risk variants and their weights have been identified in genome-wide association studies. Here, we...
45 CitationsSource
#1Qian Zhang (UQ: University of Queensland)H-Index: 111
#7Edoardo Marcora (ISMMS: Icahn School of Medicine at Mount Sinai)H-Index: 14
Last. Peter M. VisscherH-Index: 146
view all 22 authors...
Genetic association studies have identified 44 common genome-wide significant risk loci for late-onset Alzheimer’s disease (LOAD). However, LOAD genetic architecture and prediction are unclear. Here we estimate the optimal P-threshold (Poptimal) of a genetic risk score (GRS) for prediction of LOAD in three independent datasets comprising 676 cases and 35,675 family history proxy cases. We show that the discriminative ability of GRS in LOAD prediction is maximised when selecting a small number of...
30 CitationsSource
#1Qianqian Zhang (AU: Aarhus University)H-Index: 1
#2Florian Privé (AU: Aarhus University)H-Index: 9
Last. Doug Speed (AU: Aarhus University)H-Index: 21
view all 4 authors...
At present, most tools for constructing genetic prediction models begin with the assumption that all genetic variants contribute equally towards the phenotype. However, this represents a sub-optimal model for how heritability is distributed across the genome. Here we construct prediction models for 14 phenotypes from the UK Biobank (200,000 individuals per phenotype) using four of the most popular prediction tools: lasso, ridge regression, Bolt-LMM and BayesR. When we improve the assumed heritab...
8 CitationsSource
#1Diana O. Perkins (UNC: University of North Carolina at Chapel Hill)H-Index: 87
#2Loes M. Olde Loohuis (UCLA: University of California, Los Angeles)H-Index: 21
Last. Scott W. WoodsH-Index: 105
view all 16 authors...
Objective:The 2-year risk of psychosis in persons who meet research criteria for a high-risk syndrome is about 15%−25%; improvements in risk prediction accuracy would benefit the development and im...
37 CitationsSource
#1Kristina DobrindtH-Index: 3
#2Hanwen ZhangH-Index: 5
Last. Kristen J. BrennandH-Index: 43
view all 16 authors...
3 CitationsSource
#1Florian Privé (AU: Aarhus University)H-Index: 9
#2Bjarni J. Vilhjálmsson (AU: Aarhus University)H-Index: 25
Last. Michael G. B. Blum (CNRS: Centre national de la recherche scientifique)H-Index: 31
view all 4 authors...
Polygenic prediction has the potential to contribute to precision medicine. Clumping and thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, several p value thresholds are tested to maximize predictive ability of the derived polygenic scores. Along with this p value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T scores corresponding to a grid over four hyper-parameters. For ...
39 CitationsSource
#1Luke R. Lloyd-Jones (UQ: University of Queensland)H-Index: 13
#2Jian Zeng (UQ: University of Queensland)H-Index: 16
Last. Peter M. Visscher (UQ: University of Queensland)H-Index: 146
view all 15 authors...
Accurate prediction of an individual’s phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonl...
106 CitationsSource
Cited By4
#3Dorret I. Boomsma (VU: VU University Amsterdam)H-Index: 184
Background and purpose null The ENIGMA-EEG working group was established to enable large-scale international collaborations among cohorts that investigate the genetics of brain function measured with electroencephalography (EEG). In this perspective, we will discuss why analyzing the genetics of functional brain activity may be crucial for understanding how neurological and psychiatric liability genes affect the brain. null Methods null We summarize how we have performed our currently largest ge...
1 CitationsSource
#1Zhiqiang Sha (MPG: Max Planck Society)H-Index: 1
#2Antonietta Pepe (Commissariat à l'énergie atomique et aux énergies alternatives)H-Index: 7
Last. Clyde Francks (MPG: Max Planck Society)H-Index: 57
view all 10 authors...
Roughly 10% of the human population is left-handed, and this rate is increased in some brain-related disorders. The neuroanatomical correlates of hand preference have remained equivocal. We re-sampled structural brain image data from 28,802 right-handers and 3,062 left-handers (UK Biobank population dataset) to a symmetrical surface template, and mapped asymmetries for each of 8,681 vertices across the cerebral cortex in each individual. Left-handers and right-handers showed average differences ...
#1Mohammad Ahangari (VCU: Virginia Commonwealth University)
#2Amanda E. Gentry (VCU: Virginia Commonwealth University)H-Index: 4
Last. Brien P. Riley (VCU: Virginia Commonwealth University)H-Index: 62
view all 9 authors...
Importance: null Multiplex schizophrenia families have higher recurrence risk of schizophrenia compared to the families of singleton cases in the population, but the source of increased familial recurrence risk is unknown. Determining the source of this observation is essential, as it will define the relative focus on common versus rare genetic variation in case-control and family studies of schizophrenia. Objective: To evaluate the role of common risk variation in the recurrence risk of schizop...
#1Kit K. Elam (IU: Indiana University)H-Index: 13
#2Chung Jung Mun (Johns Hopkins University)H-Index: 9
Last. Thao Ha (ASU: Arizona State University)H-Index: 16
view all 4 authors...
A substance use offense reflects an encounter with law enforcement and the court system in response to breaking the law which may increase risk for substance use problems later in life. Individuals may also be at risk for substance use offending and substance use problems based on genetic predisposition. We examined a mediation model in which polygenic risk for aggression predicted adult substance use disorder diagnoses (SUD) via substance use offending in emerging adulthood. In addition, we exp...
1 CitationsSource