Understanding Polygenic Scores: A Genetic Guide
A look into polygenic scores and their implications for traits and health.
― 5 min read
Table of Contents
Polygenic Scores are numbers that help predict certain traits or characteristics in people based on their genetic makeup. They consider many small variations in genes that can affect traits like height, weight, or even the risk of developing certain health conditions.
How Are Polygenic Scores Calculated?
To calculate a polygenic score, researchers first look at a large number of individuals to identify Genetic Variants linked to a specific trait. The process typically involves several steps:
- Collecting Data: Scientists gather genetic data from a large group of people, often using a method called a genome-wide association study (GWAS). This study looks for links between specific genetic variants and traits. 
- Estimating Effects: For each genetic variant found, researchers measure its effect on the trait of interest. This effect is averaged over different environments and backgrounds of the individuals in the study. 
- Creating Scores: Once these effects are known, they can be combined into a polygenic score for each individual. This score is calculated by adding up all the genetic variants an individual has, each weighted by its estimated effect on the trait. 
- Making Predictions: If the calculations are done correctly, these scores can give an estimate of what the individual's trait value might be, assuming that genetic and environmental effects work together in a straightforward way. 
The Promises and Pitfalls
While polygenic scores can be useful, they come with challenges. Here are some key points to consider:
- Environmental Factors: The environments in which individuals live can affect their traits. If the individuals in the study come from very different backgrounds, it can make it hard to separate the effects of genes from the effects of their environments. 
- Ancestry and Bias: Differences in ancestry can introduce biases. For example, if certain ancestry groups are more likely to live in specific environments, the estimated effects of genetic variants can become skewed. 
- Correlated Traits: Sometimes traits can be correlated with ancestry, which means that the observed patterns in polygenic scores might reflect environmental differences as much as they do genetic differences. 
- Geographic Clustering: Studies often find that polygenic scores show patterns based on geography, even when researchers try to control for ancestry. This leads to questions about whether these patterns are due to real genetic differences or biases in the data. 
Application in Medicine and Research
Polygenic scores hold promise in both basic scientific research and medical applications. Here’s how they could be useful:
- Predicting Health Risks: By using polygenic scores, doctors may predict a patient's risk for certain diseases based on their genetic information. This could lead to more personalized treatments and prevention strategies. 
- Understanding Evolution: Researchers can use polygenic scores to study how traits have evolved over time and how natural selection might have influenced certain genetic variations. 
- Addressing Health Disparities: If certain groups are known to have different polygenic scores, it could help in understanding health disparities between different populations, although caution must be taken to interpret these differences correctly. 
Challenges in Using Polygenic Scores
Despite their potential, there are several challenges to using polygenic scores effectively:
- Population Stratification: When different groups have different backgrounds, it can lead to skewed results. If a polygenic score is calculated from one group and used to predict traits in another, the accuracy can drop significantly. 
- Estimation Errors: Researchers need to be careful in estimating the effects of genetic variants. Errors in these estimations can lead to biased polygenic scores, which in turn affect predictions. 
- Need for Diverse Samples: For polygenic scores to be accurate, they need to be calculated from diverse genetic backgrounds. This diversity helps in making reliable predictions across different populations. 
- Complexity of Genetic Interactions: Traits often result from complex interactions between multiple genes and environmental factors. Simplifying these interactions into a score can overlook important nuances. 
Efforts to Improve Polygenic Scores
To address the challenges with polygenic scores, scientists are exploring various approaches:
- Improving Data Collection: Gathering data from a wide range of populations can help ensure that polygenic scores are more accurate and broadly applicable. 
- New Statistical Methods: Researchers are continually developing advanced statistical methods to better account for population structure and ensure that biases are minimized. 
- Including Environmental Data: Incorporating environmental data alongside genetic data might help provide a clearer picture of how both factors influence traits. 
- Testing in Real-World Settings: Beyond the lab, testing polygenic scores in real-world conditions can help validate their usefulness and accuracy. 
The Future of Polygenic Scores
As research continues, polygenic scores may become more reliable tools for predicting traits in individuals. This could lead to breakthroughs in personalized medicine and a better understanding of human genetics. However, it’s crucial that scientists remain aware of the limitations and biases that can arise in their calculations and interpretations.
In conclusion, polygenic scores represent a promising method for linking genetics to traits and health outcomes. As technology and methods improve, these scores may play an increasingly important role in both research and medicine, helping to tailor treatments to individuals based on their genetic profiles. Nonetheless, caution and rigorous methodologies are essential to ensure the validity and applicability of these scores across diverse populations.
Title: Testing for differences in polygenic scores in the presence of confounding
Abstract: Polygenic scores have become an important tool in human genetics, enabling the prediction of individuals phenotypes from their genotypes. Understanding how the pattern of differences in polygenic score predictions across individuals intersects with variation in ancestry can provide insights into the evolutionary forces acting on the trait in question, and is important for understanding health disparities. However, because most polygenic scores are computed using effect estimates from population samples, they are susceptible to confounding by both genetic and environmental effects that are correlated with ancestry. The extent to which this confounding drives patterns in the distribution of polygenic scores depends on patterns of population structure in both the original estimation panel and in the prediction/test panel. Here, we use theory from population and statistical genetics, together with simulations, to study the procedure of testing for an association between polygenic scores and axes of ancestry variation in the presence of confounding. We use a general model of genetic relatedness to describe how confounding in the estimation panel biases the distribution of polygenic scores in a way that depends on the degree of overlap in population structure between panels. We then show how this confounding can bias tests for associations between polygenic scores and important axes of ancestry variation in the test panel. Specifically, for any given test, there exists a single axis of population structure in the GWAS panel that needs to be controlled for in order to protect the test. Based on this result, we propose a new approach for directly estimating this axis of population structure in the GWAS panel. We then use simulations to compare the performance of this approach to the standard approach in which the principal components of the GWAS panel genotypes are used to control for stratification. Author SummaryComplex traits are influenced by both genetics and the environment. Human geneticists increasingly use polygenic scores, calculated as the weighted sum of trait-associated alleles, to predict genetic effects on a phenotype. Differences in polygenic scores across groups would therefore seem to indicate differences in the genetic basis of the trait, which are of interest to researchers across disciplines. However, because polygenic scores are usually computed using effect sizes estimated using population samples, they are susceptible to confounding due to both the genetic background and the environment. Here, we use theory from population and statistical genetics, together with simulations, to study how environmental and background genetic effects can confound tests for association between polygenic scores and axes of ancestry variation. We then develop a simple method to protect these tests from confounding, which we evaluate, alongside standard methods, across a range of possible situations. Our work helps clarify how bias in the distribution of polygenic scores is produced and provides insight to researchers wishing to protect their analyses from confounding.
Authors: Jeremy J Berg, J. Blanc
Last Update: 2024-06-26 00:00:00
Language: English
Source URL: https://www.biorxiv.org/content/10.1101/2023.03.12.532301
Source PDF: https://www.biorxiv.org/content/10.1101/2023.03.12.532301.full.pdf
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to biorxiv for use of its open access interoperability.