Simple Science

Cutting edge science explained simply

# Biology# Genomics

RNA Sequencing: Insights and Challenges in Genetic Studies

Exploring the benefits and limitations of RNA sequencing in genetic research.

― 7 min read


RNA-seq Insights andRNA-seq Insights andLimitationsvariant analysis.Examining RNA-seq's role in genetic
Table of Contents

RNA Sequencing, or RNA-seq, is a modern technique used to measure how much of a particular gene is being expressed in cells. It helps scientists analyze genetic information from living organisms. This method has gained popularity for various applications, including studying Gene Expression, understanding how genes come together to form RNA, and finding changes in the DNA that could affect gene expression.

Why Use RNA-seq?

RNA-seq offers a way to gather information on gene activity. It allows researchers to see which genes are turned on or off in different tissues. However, there are challenges when relying solely on RNA-seq for genetic studies. It usually targets fewer genetic fragments compared to DNA Sequencing. Also, RNA-seq focuses on messenger RNA (mRNA), which represents only a small portion of an organism's total genetic material. This can lead to missing many genetic variations when using RNA-seq alone.

Differences Between RNA-seq and DNA Sequencing

When scientists conduct DNA sequencing, they gather a comprehensive view of the entire genome, which includes all coding and non-coding regions of DNA. In contrast, RNA-seq is limited to the regions that produce RNA, mainly those involved in protein production. This results in fewer genetic variations being detected when using RNA-seq.

For example, typical DNA studies might analyze hundreds of millions of reads for thorough coverage, while RNA studies generally analyze fewer reads, impacting the reliability of the results. Generally, RNA-seq aims to measure gene expression with around 30 million reads, while more detailed studies may require around 100 million reads.

The Challenge of Variant Calling

In genetic research, identifying variations in DNA, called variant calling, is essential. These variations can help us understand differences in traits among individuals. However, because RNA-seq captures fewer variations compared to DNA sequencing, relying only on RNA data can lead to incomplete insights.

Studies using RNA-seq have indicated that fewer genetic variants are identified than in DNA sequencing. For instance, early research with RNA-seq in cattle identified a limited number of genetic variations compared to what was found in similar DNA studies. This mismatch raises questions about the reliability of variant calling when using RNA-seq data alone.

Improving Variant Calling with Advanced Techniques

Variant calling from RNA-seq data is less common compared to DNA sequencing. However, advancements have resulted in better tools for calling variants from RNA data. One such tool is called DeepVariant, which has recently been adapted for RNA-seq. This upgrade has led to more accurate variant calling, making RNA-seq a more reliable option for genetic studies than before.

With these advancements, researchers can also identify RNA editing events, where the RNA molecule undergoes changes that can affect gene function. Recognizing these changes is vital as they may play significant roles in how genes express traits.

Studying Expression and Splicing Quantitative Trait Loci (e/sQTL)

Researchers also focus on understanding how certain genetic regions affect traits through expression and splicing quantitative trait loci (e/sQTL). This involves looking for connections between genetic variations and the levels of gene expression or splicing. By analyzing e/sQTL, scientists can determine how genetic variations within populations relate to important traits in livestock, like fertility or milk production.

Our Research

In our research, we have applied an advanced tool called DeepVariant to analyze genetic variants from RNA sequencing data collected from cattle tissues. We found that the RNA-seq variant calls are enriched for important genetic markers and correlate strongly with variants identified from DNA sequencing.

Through our analysis, we observed a substantial number of single nucleotide differences between DNA and RNA sequences. We also noticed that the quality of variant calling decreased when the sequencing coverage dropped. This indicates that having enough reads is crucial for reliable results.

Alignment of DNA and RNA Data

In our study, we utilized publicly available DNA and RNA sequencing data from cattle tissues. Using alignment tools, we mapped DNA and RNA reads to a reference genome. By analyzing the coverage of the reference genome, we discovered that DNA sequencing provided more consistent coverage across different regions compared to RNA sequencing.

RNA-seq coverage was notably enriched in regions that produce proteins, which is expected. However, there were also observations of RNA coverage in regions that do not typically produce proteins, suggesting the RNA data may contain more than just the coding sequences.

Calling Variants from Aligned Data

We used DeepVariant to call variants from our DNA and RNA data. Our findings indicated that while DNA sequencing called more variants overall, RNA sequencing still captured a significant number of variants, particularly in regions where genes expressed strongly.

The results also showed that most of the variants called from RNA data were indeed in regions that corresponded to expressed genes, and a large portion of the variants overlapped between DNA and RNA datasets.

Analyzing the Accuracy of RNA Variants

In our analysis, we evaluated how accurately the RNA-seq variants represented the genetic variants found through DNA sequencing. We found that the level of gene expression significantly influenced the accuracy of variant calling. For highly expressed genes, our results showed a high accuracy in identifying variants. However, for less expressed genes, the accuracy dropped significantly.

We also identified issues related to allele-specific expression (ASE), where certain alleles could be preferentially expressed over others, complicating the variant calling process. This effect impacted our ability to reliably determine genetic differences in some cases.

Importance of Gene Expression Levels

We discovered that RNA-seq data, especially when analyzed at deeper coverage, could provide valuable insights into genetic variations that influence traits. However, the accuracy of the analysis is significantly reliant on how well genes are expressed in the examined tissues.

Our results indicated that RNA-seq is capable of calling thousands of variants, but the breadth of the results is not as extensive as DNA-seq. It is essential to consider the expression levels in the analysis to avoid missing critical variations.

Investigating Expression QTL

We explored whether RNA-seq data alone could be utilized for mapping expression QTL (eQTL), employing RNA-seq variants to link to gene expression levels. Our findings revealed that a high percentage of eQTL could be identified using RNA-seq data alone, suggesting that RNA-seq can provide a valuable alternative when DNA data is not available.

However, we also found that while RNA-seq identified many eQTL, there were still significant discrepancies with those identified using DNA-seq. The regions chosen based on RNA-seq did not always align with those from DNA data, indicating a need for caution when interpreting RNA-only results.

Understanding RNA-DNA Differences

Throughout our research, we found numerous instances where RNA-seq identified variants that were not present in DNA sequencing. These differences raise questions about their true origins. Some may reflect RNA editing events, while others could be due to technical limitations in sequencing.

Despite the uncertainties, our findings reveal that RNA-seq captures a diverse set of variants that are not detectable through DNA sequencing alone. This highlights the need to thoroughly investigate these RNA-DNA differences to gain a complete understanding of genetic variation.

Conclusion

RNA sequencing is a crucial tool for studying gene expression but comes with limitations in variant calling. Although RNA-seq can uncover valuable genetic information, its reliance solely on gene expression levels can lead to inconsistencies when compared with DNA sequencing.

Our findings suggest that while RNA-seq can identify a significant number of genetic variants related to traits, it should not completely replace DNA sequencing methods. Each technique has its strengths and weaknesses, and using them in conjunction could provide a more comprehensive understanding of genetic variation and its implications for traits in livestock and other organisms.

Future research should continue to refine RNA sequencing methods, address the challenges of variant calling, and explore the origins and implications of the observed differences between RNA and DNA data.

Original Source

Title: RNA sequencing variants are enriched for eQTL in cattle tissues

Abstract: Association testing between molecular phenotypes and genomic variants can help to understand how genotype affects phenotype. RNA sequencing provides access to molecular phenotypes such as gene expression and alternative splicing while DNA sequencing or microarray genotyping are the prevailing options to obtain genomic variants. Here we genotype variants for 74 male Braunvieh cattle from both DNA and deep total RNA sequencing from three tissues. We show that RNA sequencing calls approximately 40% of variants (7-10 million) called from DNA sequencing, with over 80% precision, rising to over 92% of variants called with nearly 98% precision in highly expressed coding regions. Allele-specific expression and putative post-transcriptional modifications negatively impact variant genotyping accuracy from RNA sequencing and contribute to RNA-DNA differences. Variants called from RNA sequencing detect roughly 75% of eGenes identified using variants called from DNA sequencing, demonstrating a nearly 2-fold enrichment of eQTL variants. We observe a moderate-to-strong correlation in nominal association p-values (Spearman {rho}2[~]0.6), although only 9% of eGenes have the same top associated variant. We also find several highly significant RNA variant-only eQTL, demonstrating that caution must be exercised beyond filtering for variant quality or imputation accuracy when analysing or imputing variants called from RNA sequencing.

Authors: Alexander S Leonard, X. M. Mapel, H. Pausch

Last Update: 2024-05-02 00:00:00

Language: English

Source URL: https://www.biorxiv.org/content/10.1101/2024.04.29.591607

Source PDF: https://www.biorxiv.org/content/10.1101/2024.04.29.591607.full.pdf

Licence: https://creativecommons.org/licenses/by-nc/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to biorxiv for use of its open access interoperability.

More from authors

Similar Articles