Simple Science

Cutting edge science explained simply

# Biology# Genomics

New Insights into ASD Through Noncoding Variants

Research explores noncoding genetic variants linked to autism spectrum disorder.

― 6 min read


ASD and Genetic VariantsASD and Genetic VariantsExaminedvariants and autism.Study reveals links between noncoding
Table of Contents

Whole Genome Sequencing (WGS) is a way to look at all the genes in a person’s DNA. Recently, scientists have started using WGS to study how certain rare changes in the DNA, especially those that do not change the proteins made by the genes, relate to complex traits in people. One complex trait of interest is autism spectrum disorder (ASD), which is a developmental condition that affects communication and behavior.

Background on Genetic Studies and ASD

In the past, researchers used smaller sets of genetic information, like arrays or exome sequencing, to find connections between common or rare genetic changes that could be linked to ASD. However, these methods did not allow for a deep investigation into how rare changes in noncoding regions of DNA might influence ASD.

One important group of data that researchers have turned to is from the Simons Simplex Collection (SSC). This dataset includes WGS information from many families with at least one child with ASD. Researchers are particularly interested in looking at "de novo" variants, which are genetic changes that are new in the child and not inherited from the parents. These variants can be significant because they are less likely to have been influenced by natural selection, allowing researchers to focus on the potentially impactful changes.

Challenges of Analyzing Noncoding Variants

Despite having access to this valuable data, analyzing the link between noncoding variants and ASD comes with challenges. In early studies on the SSC data, many methods were tried to find associations between these variants and ASD, but none showed strong connections after correcting for multiple tests. However, one analysis did find some associations using a risk score method, which pointed to connections between variants and the regions of the DNA that regulate gene activities.

To better understand the contributions of de novo noncoding variants to ASD, researchers developed a disease impact score (DIS) based on features like chromatin accessibility and the binding of proteins that help control gene activity. They trained neural networks to predict these features based on the DNA sequences. By combining these predictions, they created scores to evaluate the importance of specific variants.

Need for Simpler Methods

The approach using neural networks, while powerful, is complex and difficult to interpret. This prompted researchers to look for simpler methods to see if they could find the same or even stronger signals without the heavy computational load. Early findings suggested that looking at local GC content-essentially the number of certain DNA bases (G and C) in the DNA sequence-might be enough to identify similar associations.

Finding Insights with Local GC Content

Initial investigations showed a strong link between local GC content and the DIS scores. This led researchers to consider whether using just local GC content could yield similar association results. They compared variant groups based on local GC content and used established genomic variants to test these associations.

When they conducted these comparisons, they found that local GC content could indeed explain many of the associations previously attributed to more complex models. However, they also recognized that while local GC content could capture significant signals, it does not rule out the possibility that other unique sequence features could also be important.

Investigating Specific Factors

Given that ASD predominantly affects males, researchers looked at how sex differences between probands (individuals with ASD) and their siblings might influence the local GC content of genetic variants. They found that the strongest signals were from families where the proband was male and the sibling was female. This suggested a more nuanced relationship where variants near certain genes might contribute differently depending on the sex of the proband and sibling.

In light of these insights, researchers created a new analytical method called the Expression Neighborhood Sequence Association Study (ENSAS). This approach allows for more comprehensive analysis by looking at not only local GC content but also specific sequences around genes linked to gene expression patterns.

ENSAS Framework and Analyses

ENSAS was designed to systematically define gene expression neighborhoods. For each gene, it identifies nearby genes based on their expression patterns and includes variants from these genes in its analysis. By focusing on these neighborhoods, ENSAS can find associations that might be missed by looking at individual genes or variants alone.

When researchers applied ENSAS to the SSC data, they focused specifically on variants from male probands and female siblings located upstream of their assigned genes. This targeted approach improved the ability to identify local GC content differences that were significantly linked to ASD.

Results from ENSAS Analysis

Using ENSAS, researchers discovered several neighborhoods with significant differences in local GC content between proband and sibling variants. The top neighborhoods were enriched for genes associated with synaptic functions, suggesting that these genetic changes could be linked to key biological processes involved in ASD.

Additional analyses looking at chromatin states-essentially how accessible or closed the DNA is in different tissues-helped explain some of the local GC content differences observed. The results indicated that certain chromatin states were more prevalent among variants from probands, suggesting a biological basis for the differences in local GC content.

Exploring the Specificity of the Association Signal

To ensure that the signals detected were not merely related to technical issues in sequencing, researchers compared segments of data from individuals whose samples were processed in the same sequencing lanes. This analysis revealed that significant associations persisted even among samples with matching lanes, indicating that the observed signals were likely not driven by sequencing batch effects.

The specificity of the signals was further tested by applying the ENSAS framework to other groups, such as male probands with male siblings and female probands with either male or female siblings. These analyses revealed that the strongest associations were unique to the male proband-female sibling group.

Insights from Other Datasets

Finally, researchers sought to validate their findings by applying ENSAS to an independent dataset derived from a population in Iceland. This dataset also contained information from trios (parents and offspring) that allowed for similar analyses. However, when applying the same methodology, the researchers found no significant associations in the Icelandic data, which suggested that the signals identified in the SSC dataset might be specific to the ASD phenotype.

Conclusion and Future Directions

Overall, the study reinforces the idea that noncoding genetic variants play a crucial role in understanding complex traits like ASD. By employing both traditional and new analytical methods, researchers can gain insights into the genetic underpinnings of ASD and potentially other psychological conditions. The findings not only shed light on the importance of rare noncoding variants but also highlight the value of simplifying analytical approaches to make findings clearer and more interpretable.

As WGS data continues to grow and more populations are studied, it will be essential to build on these methods and explore further how genetic differences relate to complex traits in diverse groups. Future research might also investigate the biological mechanisms behind the observed genetic associations and how they translate to the behaviors and symptoms associated with ASD.

Original Source

Title: Identifying associations of de novo noncoding variants with autism through integration of gene expression, sequence and sex information

Abstract: Whole-genome sequencing (WGS) data is facilitating genome-wide identification of rare noncoding variants, while elucidating their roles in disease remains challenging. Towards this end, we first revisit a reported significant brain-related association signal of autism spectrum disorder (ASD) detected from de novo noncoding variants attributed to deep-learning and show that local GC content can capture similar association signals. We further show that the association signal appears driven by variants from male proband-female sibling pairs that are upstream of assigned genes. We then develop Expression Neighborhood Sequence Association Study (ENSAS), which utilizes gene expression correlations and sequence information, to more systematically identify phenotype-associated variant sets. Applying ENSAS to the same set of de novo variants, we identify gene expression-based neighborhoods showing significant ASD association signal, enriched for synapse-related gene ontology terms. For these top neighborhoods, we also identify chromatin states annotations of variants that are predictive of the proband-sibling local GC content differences. Our work provides new insights into associations of non-coding de novo mutations in ASD and presents an analytical framework applicable to other phenotypes.

Authors: Jason Ernst, R. Li

Last Update: 2024-03-21 00:00:00

Language: English

Source URL: https://www.biorxiv.org/content/10.1101/2024.03.20.585624

Source PDF: https://www.biorxiv.org/content/10.1101/2024.03.20.585624.full.pdf

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to biorxiv for use of its open access interoperability.

More from authors

Similar Articles