Introducing deepKin: A New Method for Measuring Genetic Relatedness

Table of Contents

Methods for Measuring Genetic Relationships
Introducing DeepKin: A New Approach
Understanding the Methods of DeepKin
Inferencing Relatedness with DeepKin
Guidelines for Using DeepKin
The Importance of Effective Number of Markers
Validating the Variance
Real-World Applications: UK Biobank
Key Findings and Conclusions
Original Source
Reference Links

Understanding how individuals are related to each other is very important in genetics and public health studies. Specifically, this is crucial when researchers look at many Genetic Markers across the whole genome, a process known as genome-wide association studies (GWAS). Researchers also measure risk for certain traits or diseases using a tool called the Polygenic Risk Score (PRS). Traditionally, scientists would look at family trees to estimate how closely related people are. This method gives a good idea of expected genetic similarities. However, with the rise of genetic data from genome-wide single nucleotide polymorphisms (SNPS), researchers can now calculate real genetic relationships based on actual data.

This shift to using SNP data face some challenges. Different methods of measuring SNPs, along with how data is checked for quality, can add confusion. Therefore, figuring out the relationships that come from SNP data can be complicated.

Methods for Measuring Genetic Relationships

There are different ways to estimate how closely related people are using SNP data. Some methods use maximum likelihood approaches, while others use moments-based estimators. Although moments-based estimators may not be as precise, they are faster and easier to compute. Over the years, some factors have been studied that affect how we measure Relatedness. One study looked into how relationships can vary due to random genetic sampling and genetic linkage.

Currently, many researchers use SNP-based measures in population studies, but there hasn't been as much focus on how much these measures vary. The differences in SNP data due to relationships can significantly impact the power to detect pairs that are closely related compared to those that are not.

Static cut-off numbers are often used to decide if two samples are related. This can lead to mistakes, like false positives, when the variation in estimates is ignored. If researchers only rely on fixed cut-offs without considering how the data behaves, they might incorrectly label pairs as related.

Introducing DeepKin: A New Approach

The new method, called deepKin, offers a fresh way to measure relatedness using SNP data. This tool is different from earlier methods because it provides information about the sampling variation that comes with calculating relatedness. By using this new approach, deepKin can help researchers understand whether differences in relatedness are significant.

DeepKin focuses on three key concepts in estimating relatedness:

It sets a critical value to divide significant relatedness from insignificant ones.
It identifies the minimum number of genetic markers needed to spot a specific type of relative.
It shows how much statistical power can be adjusted based on the degree of relatedness being tested.

The team behind deepKin tested it through simulations and real data, showing its effectiveness. They also made deepKin available to researchers as an R package.

Understanding the Methods of DeepKin

A core aim of this study is to define the level of variation for moments-based genetic relatedness. DeepKin uses an approach similar to that of the original KING method, but with different scaling factors. Researchers can create matrices to describe genetic relationships based on genotypic values.

The KING estimator computes relatedness using specific formulas, but its estimates only represent half of the actual relatedness expected. To clarify comparisons, researchers will often double the KING estimates.

However, measuring actual genetic similarity can yield values anywhere from 0 to 1. This means there are many factors that could influence the results, and understanding the sampling variance is crucial for the estimation.

Inferencing Relatedness with DeepKin

DeepKin provides a method for researchers to test if pairs of individuals are related. By examining relationships through a statistical lens, DeepKin can calculate z-scores and corresponding p-values based on earlier empirical distributions. If researchers set a level of significance, deepKin can define a critical value for drawing conclusions about relatedness.

While relatedness scores can range continuously, it can be useful to group them into categories for easier analysis. DeepKin allows the assessment of an observed relationship against predefined degrees of relatedness using statistical tests.

The method involves two primary parameters: sample size and effective number of markers. Ultimately, deepKin aims to improve how genetic relationships are inferred by providing guidelines that help researchers make informed decisions.

Guidelines for Using DeepKin

Researchers can follow a couple of key guidelines when using deepKin:

Choose Markers Wisely: They can pinpoint the minimum effective number of markers required to detect specific relationships. By focusing only on the necessary variants, researchers can save time and reduce costs.
Understand Statistical Power: Once the significance level is set, the researchers can determine how much power could be improved or compromised based on the number of markers available. Essentially, increasing effective markers can boost the chances of identifying important relationships.

The Importance of Effective Number of Markers

The effective number of markers, often referred to as "me", is significant in estimating relatedness through deepKin. It describes the average genetic correlation between different variants. Researchers can compute this number, but doing so directly can be costly in terms of computing power.

To address this issue, two estimators are proposed. The first is a GRM-based estimator, which looks at off-diagonal elements of the genetic relationship matrix. The second is a randomization-based estimator, which improves efficiency by iterating through a set number of trials.

In simulations, researchers validate deepKin's effectiveness using both estimators to demonstrate statistical precision.

Validating the Variance

The methodical validation of the deepKin's approach involves focusing on both single and multiple locus models. Researchers tested how well the expected results align with observed data under various scenarios to confirm the robustness of their findings.

Simulations demonstrate that the deepKin method effectively captures true relationships, ensuring reliability across different degrees of relatedness.

Real-World Applications: UK Biobank

In a practical application, researchers applied deepKin to a large dataset from the UK Biobank, which included information from over 3,000 participants. They examined multiple SNP sets with different characteristics to understand the impact of different genetic markers.

By doing this, researchers could observe how deepKin performed in classification tasks, finding correlations between varying degrees of relatedness. It was confirmed that as effective markers increased, deepKin became more reliable in classifying relationships.

Furthermore, deepKin explained the relationships within the UK Biobank dataset, highlighting related individuals and their connections based on geographical locations. This added depth to the understanding of how population structure can influence genetic relationships.

Key Findings and Conclusions

The differences between deepKin and earlier methods, such as KING, lie in deepKin's ability to account for missing elements like sampling variance and thereby enhance statistical inference. A thorough understanding of the sampling variance ties directly to the effectiveness of relatedness inference.

Moreover, the effective number of markers plays a critical role, allowing researchers to fine-tune their analyses for optimal results. In turn, this can influence how researchers assess relationships, particularly when considering allele frequencies in SNP sets.

Researchers suggest further studies to refine the assumptions made in models and encourage the removal of low-frequency variants to avoid misleading results.

Overall, deepKin offers a fresh approach to genetic relationship analysis that can be used in various fields, including genetics and forensic applications. It brings a new level of precision and rigor to understanding how individuals are related based on genetic data.

Introducing deepKin: A New Method for Measuring Genetic Relatedness

deepKin improves how we assess genetic relationships using SNP data.

Methods for Measuring Genetic Relationships

Introducing DeepKin: A New Approach

Understanding the Methods of DeepKin

Inferencing Relatedness with DeepKin

Guidelines for Using DeepKin

The Importance of Effective Number of Markers

Validating the Variance

Real-World Applications: UK Biobank

Key Findings and Conclusions

Reference Links

Referenced Topics

Introducing deepKin: A New Method for Measuring Genetic Relatedness

deepKin improves how we assess genetic relationships using SNP data.

#Methods for Measuring Genetic Relationships

#Introducing DeepKin: A New Approach

#Understanding the Methods of DeepKin

#Inferencing Relatedness with DeepKin

#Guidelines for Using DeepKin

#The Importance of Effective Number of Markers

#Validating the Variance

#Real-World Applications: UK Biobank

#Key Findings and Conclusions

Reference Links

Referenced Topics

Methods for Measuring Genetic Relationships

Introducing DeepKin: A New Approach

Understanding the Methods of DeepKin

Inferencing Relatedness with DeepKin

Guidelines for Using DeepKin

The Importance of Effective Number of Markers

Validating the Variance

Real-World Applications: UK Biobank

Key Findings and Conclusions