Simple Science

Cutting edge science explained simply

# Biology# Bioinformatics

deepSpecas: A New Tool for Alternative Splicing Detection

deepSpecas aids researchers in identifying alternative splicing events from RNA-Seq data with high accuracy.

Simone Ciccolella, Luca Denti, Jorge Avila Cartes, Gianluca Della Vedova, Yuri Pirola, Raffaella Rizzi, Paola Bonizzoni

― 6 min read


deepSpecas Enhances ASdeepSpecas Enhances ASEvent Detectionsplicing events from RNA-Seq data.Accurately identifies alternative
Table of Contents

Alternative Splicing (AS) is a clever way that our cells can make many different proteins from a single gene. Think of it like a Swiss army knife-one tool that can do a lot of different things! This process can be useful for both normal body functions and in cases of diseases.

The Role of Next-generation Sequencing

Next-Generation Sequencing (NGS) is a fancy term for high-tech methods that let scientists read lots of DNA quickly. It’s like upgrading from a slow book reader to a speedy e-reader. This technology allows researchers to analyze gene expression more effectively in two main ways. First, they can look at how genes are expressed at a much finer level, and second, they can find new genes and variations that we didn't know existed before.

The Limitations of Early NGS Technologies

However, when NGS techniques first appeared, they had some hiccups. One of the biggest issues was the short length of the DNA pieces (or "reads") that they could analyze at once. This made it tricky to tell apart similar-looking sequences. Since alternative splicing lets different protein forms share a lot of the same DNA segments, these early tools sometimes mixed things up. This made counting specific RNA Transcripts (the messenger molecules that help make proteins) complicated.

Why Counting Transcripts Matters

Counting different RNA types is important because it helps researchers identify which genes are active in specific situations. With so many genes to choose from, focusing on those few that matter can lead to better understanding of how certain conditions manifest.

Tools for Transcript Quantification

Various programs can help with counting these RNA transcripts. Tools like StringTie, Cufflinks, Scripture, and IsoLasso help piece together and count RNA sequences. Other tools like Kallisto and Salmon focus on quantifying specific transcripts based on input data.

Detecting Alternative Splicing Events

In addition to counting, identifying AS events is also crucial. This means figuring out which forms of a gene are active in different samples. Rather than looking for differences in how much RNA is produced, some methods focus on finding the specific AS events that differ between samples. Tools such as rMATS and SpliceSeq were designed for this purpose. They are like detectives trying to figure out how events change in different samples by looking for key signals in the data.

Using Deep Learning in Bioinformatics

Recently, some researchers have started using deep learning-think of it as teaching computers to learn like we do-to tackle challenges in bioinformatics. Techniques like Convolutional Neural Networks (CNNs), especially a type called Residual Neural Networks (ResNet), are being employed for various tasks such as spotting variants, classifying data, and analyzing gene expression.

Introducing deepSpecas

We’ve developed a new tool called deepSpecas to find alternative splicing events in two RNA-Seq samples. This tool uses a unique approach to represent the data visually, allowing for better analysis by computers. The idea is to avoid needing a specific gene map, which may not always be complete-imagine trying to decipher a treasure map that’s missing half the details!

How deepSpecas Works

deepSpecas takes input alignments of two RNA-Seq samples and a list of genomic regions where alternative splicing might occur. After analyzing these regions, the program predicts which specific AS events each sample is expressing.

Input Requirements

To get started, you need the read alignments in a specific format (BAM) from two RNA-Seq samples. You also need to specify the genomic regions of interest. The tool creates visual representations (images) of the data, making it easier for the computer to process.

Training the Deep Learning Model

To train the deep learning model, we used synthetic RNA-seq samples to create a solid set of labeled examples. Starting with a well-known gene annotation, we isolated regions where alternative splicing events occurred. Then, we simulated realistic RNA-seq reads, aligning them back to a reference genome.

Image Encoding

The tool uses image representations for the read alignments, mimicking how genomic viewers display data. Six different ways existed for encoding this information, such as showing coverage levels across regions or aligning read patterns for visual comparison.

Building the Training Dataset

Images were created based on multiple scenarios, including cases where an alternative splicing event occurred and others where it didn’t. A certain portion of reads from one sample was mixed with those from another to simulate noise that could happen in real data. This process helped the model become more sturdy.

Structure of the Neural Network

To classify the regions of interest, we employed a ResNet50 architecture. This setup adapts to handle different image types, allowing the model to classify events accurately. The final layer produces a single label for each region, determining if a specific alternative splicing event is present.

Training and Validating the Model

The model was trained using a significant number of images, divided into training and testing sets. A thorough checking system (cross-validation) was implemented to ensure that the model wouldn’t get confused between different scenarios.

Evaluating deepSpecas

To see how well deepSpecas performs, we tested it against real samples of RNA-Seq data. The results showed that the tool could accurately identify between 70% and 80% of different AS events. After refining the dataset to include only reliable events, the performance improved significantly.

Real RNA-Seq Data Analysis

Using actual RNA-Seq data from a specific study, we evaluated deepSpecas further. This data compared samples before and after knocking down certain regulatory proteins. The results after careful analysis yielded a solid set of AS events, which we further examined to ensure accuracy.

The Importance of Manual Inspection

The results showed plenty of promising findings, but not all of the reported events could be trusted. So, we took the extra step of manually inspecting the data to weed out less reliable calls. This process helped us gain the most accurate representation of significant AS events.

Results and Conclusions

In conclusion, deepSpecas stands out as a handy tool for identifying alternative splicing events without the need for a specific gene annotation. It performed well even in noisy datasets, highlighting its potential in various applications.

Moreover, the tool's development included creating a curated dataset to benchmark future AS detection tools. This is a crucial step, as many tools are available, but a reliable means of comparing them didn’t previously exist.

Future Directions

Currently, deepSpecas focuses on specific regions rather than analyzing an entire genome. Future developments aim to enhance its capability for larger-scale investigations. The idea is to run deepSpecas on entire transcriptomes, making it even more powerful in the quest to understand alternative splicing and its implications in health and disease.

In short, thanks to deepSpecas, researchers now have a reliable tool to tackle the complex world of alternative splicing. It's like giving scientists a trusty map to navigate through the genome's nuances, ensuring they find the best pathways to new discoveries!

Original Source

Title: Differential Analysis of Alternative Splicing Events in gene regions using Residual Neural Networks

Abstract: Several computational methods for the differential analysis of alternative splicing (AS) events among RNA-seq samples typically rely on estimating isoform-level gene expression. However, these approaches are often error-prone due to the interplay of individual AS events, which results in different isoforms with locally similar sequences. Moreover, methods based on isoform-level quantification usually need annotated transcripts. In this work, we leverage the ability of deep learning networks to learn features from images, to propose deepSpecas, a novel method for event-based AS differential analysis between two RNA-seq samples. Our method does not rely on isoform abundance estimation, neither on a specific annotation. deepSpecas employs an image embedding scheme to represent the alignments of the two samples on the same region and utilizes a residual neural network to predict the AS events possibly expressed within that region. To our knowledge deepSpecas is the first deep learning approach for performing an event-based AS analysis of RNA-seq samples. To validate deepSpecas, we also address the lack of high quality AS benchmark datasets. For this purpose, we manually curated a set of regions exhibiting AS events. These regions were used for training our model and for comparing our method with state-of-the-art event-based AS analysis tools. Our results highlight that deepSpecas achieves higher precision at the expense of a small reduction in sensitivity. The tool and the manually curated regions are available at https://github.com/sciccolella/deepSpecas.

Authors: Simone Ciccolella, Luca Denti, Jorge Avila Cartes, Gianluca Della Vedova, Yuri Pirola, Raffaella Rizzi, Paola Bonizzoni

Last Update: Nov 3, 2024

Language: English

Source URL: https://www.biorxiv.org/content/10.1101/2024.10.30.621059

Source PDF: https://www.biorxiv.org/content/10.1101/2024.10.30.621059.full.pdf

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to biorxiv for use of its open access interoperability.

Similar Articles