Simple Science

Cutting edge science explained simply

# Biology # Bioinformatics

The Evolving World of Genomics

Discover how genomics is shaping health and medicine through advanced sequencing technologies.

Renato Santos, Hyunah Lee, Alexander Williams, Anastasia Baffour-Kyei, Gerome Breen, Alfredo Iacoangeli

― 7 min read


Genomics Unleashed Genomics Unleashed future of healthcare. Advanced sequencing is changing the
Table of Contents

Genomics is the study of genes and their functions. It has evolved a lot over the years, especially after the Human Genome Project completed its work nearly twenty years ago. This project mapped the entire human DNA sequence, paving the way for countless studies into genetic variation and how it relates to health and diseases.

A Quick Look at DNA

Maybe you already know, but DNA is basically the instruction manual for building and operating a living organism. It consists of sequences of four chemical building blocks. These are adenine (A), cytosine (C), guanine (G), and thymine (T). Just as the letters in a book create words and sentences, these building blocks form the code that tells our bodies what to do.

Genomic Technologies

With technologies improving, scientists can now look at our DNA more closely than ever. This means they can spot differences in our DNA that might affect our health. There are several tools that help researchers look at our genetic make-up, each with its own pros and cons. Let's dive into some of these.

DNA Microarrays

Imagine a tiny garden filled with different flowers, each representing a specific gene. That's somewhat how DNA microarrays work. They are tools that can test for many known DNA variations at once. Think of them as the Ikea of genetics: they assemble different parts into one compact package.

One popular method using DNA microarrays is SNP Arrays. SNPs, or single nucleotide polymorphisms, are tiny changes in DNA that can affect how we respond to drugs, how likely we are to get certain diseases, and more. The advantages of using SNP arrays include their cost-effectiveness and ability to process large amounts of data quickly.

However, there's a catch! They can only identify changes we've already discovered. If a new variation pops up, SNP arrays won’t be able to find it. So, if you're looking to hunt for rare variations, you might want to try something else.

Short-read Sequencing

Then we have short-read sequencing, another often-used technique in genomics. It's like reading a book one sentence at a time rather than looking at the whole picture. This method is popular because it’s accurate and relatively cheap. It's perfect for identifying small mutations because it can map tiny pieces of DNA with great confidence.

However, scientists have noticed that it's tricky to read through complex areas of DNA, which are like the twists and turns of a really complicated novel. In these regions, short-read sequencing can misread or miss some parts entirely, leading to gaps in our understanding. It's like trying to find your way through a maze with a flashlight that keeps flickering.

Long-read Sequencing

To tackle these issues, scientists turned to long-read sequencing, which is like reading the whole book at once. This technology can read much longer sections of DNA in one go, helping to fill in the gaps where short-read techniques sometimes stumble. Oxford Nanopore Technologies (ONT) is a company making waves in this space.

ONT's method uses a unique approach that involves threading a strand of DNA through a tiny hole, or nanopore, while measuring changes in electricity. This lets scientists read the DNA as it passes through the pore. The coolest part? These long reads can sometimes reach lengths of tens of thousands of nucleotides, which is useful for deciphering those complex sections of DNA.

Despite some initial hiccups with accuracy, improvements have led to impressive results. Recent advancements mean these long reads can now be nearly as accurate as short reads, making long-read sequencing a valuable player in genomics research.

Combining Technologies

Researchers often use a mix of technologies to paint a complete picture of the genome. For instance, they might use SNP arrays for a broad overview and then short- and long-read sequencing to dig deeper into specific areas of interest. It’s like using different tools in your toolbox: each one does a job better than the others.

The Importance of Sequencing Quality

When conducting genomic research, quality matters. Sequencing quality can affect how confidently researchers can trust the results. Picture this: you’re assembling furniture from Ikea, and the parts are poorly labeled. Would you trust that your chair isn’t going to collapse? Similarly, genomic studies need clear, high-quality data to avoid errors.

Researchers look at several factors, such as sequencing depth (how many times each section of DNA has been read) and read length (how long the DNA sequences are). Higher depth and diverse read lengths tend to lead to better results.

Sequencing Depth

In simple terms, if you want to know what's happening in the genome, reading more often is usually a good strategy. When researchers increase the depth of sequencing, they often find that the accuracy of identifying genetic variants improves. However, there’s a point of diminishing returns, much like how cramming for exams only works to a certain extent.

Read Length

Longer reads aren’t always better, but they can help when studying complex regions of the genome. Shorter reads might struggle to provide the full picture, while longer reads could shine in these tricky areas. In some studies, researchers have noticed a slight decline in variant calling performance as read lengths increase. This might seem counterintuitive, but it’s a reminder that genomics is a complex field, and every factor can play a role.

Looking at Variants

Variants in DNA are like typos in a book. Some variants are harmless, while others can lead to diseases or affect how we respond to medications. Thus, understanding these variants is crucial for advancing personalized medicine and other areas of healthcare.

Single Nucleotide Variants (SNVs)

Among the various types of genetic variants, the single nucleotide variant (SNV) is like a single typo in our DNA. Identifying these tiny changes is essential for understanding many conditions. Researchers use various technologies to detect these SNVs, and different platforms often yield different results, depending on their strengths and weaknesses.

Insertions and Deletions (Indels)

Next up are insertions and deletions, or indels, which are a bit like adding or removing words in a sentence. They can change the meaning for better or worse. As with SNVs, researchers look at the performance of different sequencing platforms when detecting these variants.

They find that short-read sequencing shines when it comes to finding these changes in more straightforward regions. Yet, longer reads have better chances of spotting indels in complicated areas of the genome. So, once again, there's no one-size-fits-all solution.

Structural Variants (SVs)

Now, let's talk about structural variants. Think of these as the rearrangements of chapters in a book or even entire volumes going missing. Structurally, these variants can be large and complex, and both long-read and short-read sequencing technologies contribute to identifying them.

Long-read sequencing has a distinct advantage when it comes to detecting structural variants. This technology can pick up on large changes that might go unnoticed with short reads. Hence, researchers can find a greater variety of structural variants by combining results from both platforms.

The Role of Multiplexing

Researchers often try to save time and money by sequencing multiple samples at once through a method called multiplexing. While this can be a great cost-saving measure, it sometimes impacts the quality of the sequencing.

It's like inviting too many guests to a dinner party: while you can feed everyone at once, the quality of the food might suffer if you overstretch your resources. So, keeping a balance between the number of samples and the quality of sequencing is crucial. Researchers found that multiplexing can slightly lower accuracy in detecting variants, especially structural variants. However, optimizing the process could help mitigate those effects.

Conclusion and Future Directions

As technology keeps evolving, researchers are excited about the potential of combining different sequencing methods. It opens up doors for exploring the genome further than ever before. Genomics has the potential to impact healthcare by enabling personalized medicine – just think of it as tailoring treatments to your genetic makeup.

With ongoing improvements in sequencing technologies and their integration in research, we can expect deeper insights into human health and disease. After all, in the world of genetics, the game is only just beginning, and the hunt for understanding our DNA is far from over!

Original Source

Title: Investigating the performance of Oxford Nanopore long-read sequencing with respect to Illumina microarrays and short-read sequencing

Abstract: Oxford Nanopore Technologies (ONT) long-read sequencing (LRS) has emerged as a promising tool for genomic analysis, but comprehensive comparisons with established platforms across diverse datasets remain limited. In this study, we present a comprehensive comparison of ONT long-read sequencing (LRS) against Illumina short- read sequencing (SRS) and microarray platforms across 14 human genomes. We performed ONT sequencing using both multiplexed and singleplexed approaches and compared the results with matched Illumina microarray and SRS data. We assessed sequencing quality metrics, variant detection performance for single nucleotide variants (SNVs), insertions/deletions (indels), and structural variants (SVs), while examining the impact of sequencing depth, read length, and multiplexing. ONT LRS demonstrated competitive performance with SRS for SNV detection, particularly in low complexity regions (F-measure: 0.763 vs 0.770), though with slightly lower performance in high complexity regions (F-measure: 0.954 vs 0.968). For indel detection, LRS showed robust performance in high complexity regions (F-measure: 0.850) which, however, decreased in low complexity regions (F-measure: 0.453). LRS identified 2.82 times more structural variants than SRS and detected variants across a broader size range (8 bp to 129 Mb vs 2 bp to 6 kb). Sequencing depth strongly correlated with variant calling performance across all variant types, with correlation coefficients of 0.80 for SNVs in high complexity regions, 0.84 for SNVs in low complexity regions, and exceeding 0.9 for indels. SV detection in LRS showed strong depth dependence (r = 0.939), while SRS SV calls remained stable across depths. Our findings demonstrate that ONT LRS complements existing sequencing technologies, offering advantages in detecting structural variants and analysing low complexity regions, while maintaining competitive performance in standard variant detection. This study provides practical insights for optimising ONT sequencing strategies and highlights areas for future methodological improvement.

Authors: Renato Santos, Hyunah Lee, Alexander Williams, Anastasia Baffour-Kyei, Gerome Breen, Alfredo Iacoangeli

Last Update: Dec 22, 2024

Language: English

Source URL: https://www.biorxiv.org/content/10.1101/2024.12.19.629409

Source PDF: https://www.biorxiv.org/content/10.1101/2024.12.19.629409.full.pdf

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to biorxiv for use of its open access interoperability.

More from authors

Similar Articles