Pangenomes Pave the Way in Genetic Testing
New methods help detect large DNA changes, improving disease diagnosis.
Francesco Mazzarotto, Özem Kalay, Elif Arslan, Valeria Cinquina, Deniz Turgut, Rachel J Buchan, Mona Allouba, Valeria Bertini, Sarah Halawa, Pantazis Theotokis, Gungor Budak, Francesca Girolami, Petra Peldova, Jiri Bonaventura, Yasmine Aguib, Marina Colombi, Iacopo Olivotto, Massimo Gennarelli, Milan Macek, Elisabetta Pelo, Marco Ritelli, Magdi Yacoub, Paul JR Barton, H Serhat Tetikol, Roddy Walsh, James S Ware, Amit Jain
― 7 min read
Table of Contents
In the world of genetic testing, finding large changes in DNA can be a tricky business. These big changes, known as Copy Number Variants (CNVs), are often like the elephant in the room—everyone knows they’re there, but spotting them is much harder than spotting a little mosquito. Traditional methods for sequencing DNA, such as whole-exome sequencing (WES) or specific gene panels, have limitations that often lead to missing these larger variants. Despite their popularity and decreasing costs, short-read techniques still struggle with large genetic changes.
Meanwhile, another technique called whole-genome sequencing (WGS) shows promise. It can help scientists identify these large changes more easily. Yet, WGS hasn't replaced short-read methods completely since it tends to be pricier. In clinical settings, cost is a big deal, and most labs still use the tried-and-true methods despite their shortcomings.
The Need for Change
The reliance on older methods means that many labs are missing out on identifying large variants, which could be crucial for diagnosing inherited conditions. Current methods may find harmful changes in genes, but they fail when it comes to picking up bigger issues. Newer gold standards need to be implemented, particularly in busy diagnostic labs that might not have the luxury of a dedicated bioinformatics team.
Enter the superheroes of this narrative: computational tools designed to detect large genetic variants from those pesky short-read data. These tools vary greatly in how well they work, and they use different methods to analyze the DNA. The common options include looking at the depth of the reads, examining split reads that could point to larger changes, analyzing unusual distances between paired reads, or even reconstructing DNA segments from scratch.
A New Hope: Pangenome References
Now we reach the exciting part—pangenome references. You see, traditional DNA references are like taking a selfie with just one perfect filter. Pangenomes, on the other hand, are like showcasing a whole album filled with various filters, styles, and backgrounds. They include genetic variations and alternative options from multiple sources, providing a more complete picture.
Thanks to these pangenome references, scientists can map reads more accurately, which in turn leads to better spotting of large variants. In fact, the improved mapping accuracy translates into heightened identification of those elusive large genetic changes. Research indicates that using pangenome references in various DNA studies—from plants to humans—yields better results overall.
However, the routine use of this advanced approach in clinical settings is still a big question mark. Therefore, researchers have taken it upon themselves to test the waters.
The Study
In a recent study focusing on cardiomyopathy (a fancy term for heart problems), researchers evaluated how effective pangenome-based approaches were for detecting large variants. They analyzed a large group of 1969 patients who had heart issues, alongside 1805 healthy controls, all using a popular sequencing panel known as the Illumina Trusight Cardio panel.
Researchers wanted to see if using the pangenome-based GRAF pipeline would make a difference in variant detection compared to three other methods—GATK HaplotypeCaller, Manta, and ExomeDepth. By leveraging the strengths of these different tools, they aimed to figure out where the pangenome stood in comparison.
Why cardiomyopathy, you may ask? Well, heart conditions are common, and researchers knew that large variants could be responsible for some of these cases, even if they accounted for a smaller portion of the overall disease burden.
Results of the Comparison
After running their analyses, the results were quite telling. It turned out that GRAF achieved the highest recall rate in identifying large variants without introducing any false positives. In simpler terms, it was like the star student in the class, answering questions correctly and without any mistakes. Meanwhile, Manta and ExomeDepth struggled, and while they identified some variants, they also made a lot of wrong calls.
Overall, researchers found that while GRAF was leading the pack, there was still room for improvement. The presence of large variants that went undetected indicated that even the best methods could use some extra polishing.
The Technical Stuff
To conduct their analysis, researchers invoked a total of 3774 samples, which were sequenced using the Illumina Trusight Cardio panel. They focused on genes known to be associated with inherited heart diseases. They categorized and filtered the data carefully to ensure only the most promising variant calls were considered for further lab validation.
The Genes Under Scrutiny
The researchers zeroed in on 23 genes known to contribute to these heart conditions. Some genes were examined for both dilated cardiomyopathy (DCM) and hypertrophic cardiomyopathy (HCM), while others were specific to just one type or the other. It’s like looking for the right ingredients to make a delicious dish; you have to make sure you’ve got the right components in place!
Variant Identification
Using the four different methods mentioned earlier, the team set out to identify large variants in the targeted genes. Each tool took a unique approach to analyze the data, leading to a mix of results. After extensive filtering and prioritization, they narrowed down the 39 variant calls worth investigating further.
The Lab Validation
Once they had their prioritized list, the researchers took those findings to the lab for validation. Out of the 39 variants, only 4 were confirmed. Even though this sounds low, it reflects the general challenge researchers face when trying to validate large variants.
Of the validated variants, GRAF emerged as the champion with a 100% validation rate, while Manta and ExomeDepth didn’t quite measure up in terms of precision. This highlighted the ongoing struggle in identifying and confirming large genetic variants in clinical settings.
The Hidden Burden
Despite the small number of validated variants, the study also provided perspectives on the potential burden of large pathogenic variants in DCM and HCM. In DCM, the researchers calculated a burden of 0.22%, while in HCM, the burden stood at 0.1%. The background rate in healthy individuals was even lower, at 0.05%.
These numbers indicate that large variants might not be the primary culprits in the overall disease burden but could still play a role in specific cases. The findings emphasize that while these variants may not be predominant, they shouldn't be ignored.
Limitations and Challenges
Even with the promising GRAF approach, the study showed that identifying large variants from short-read sequencing data remains difficult. While GRAF displayed excellent precision, its overall recall could still use some work.
Different Tools, Different Results
The study revealed that each of the tools had its strengths and weaknesses. While GRAF was the best at identifying true variants, GATKhc excelled at detecting smaller variants. Manta, while less precise, could still identify some complex variants that GRAF might miss. In short, it’s a mixed bag!
Conclusion
In the end, this research serves as a reminder of the complexities involved in genetic testing. While pangenome-based approaches like GRAF show promise in enhancing the detection of large variants, the journey toward reliable genetic diagnosis is still filled with challenges.
The study lays the groundwork for future advancements in this exciting field, and perhaps one day, genetic testing will be as easy as pie—where everyone gets their well-deserved slice without missing out on the big changes lurking beneath the surface. The hope is that as these methods become more widely adopted, they will lead to improved diagnosis and treatment of genetic disorders without breaking the bank.
So, here’s to pangenomes, breakthroughs, and the hope that they’ll continue to improve how we understand and treat genetic conditions. Because if there’s one thing we know, it’s that genetics can be as tangled as spaghetti!
Original Source
Title: PANGENOMES AID ACCURATE DETECTION OF LARGE INSERTION AND DELETIONS FROM GENE PANEL DATA: THE CASE OF CARDIOMYOPATHIES
Abstract: Gene panels represent a widely used strategy for genetic testing in a vast range of Mendelian disorders. While this approach aids reliable bioinformatic detection of short coding variants, it fails to detect most larger variants. Recent studies have recommended the adoption of pangenomes to augment detection of large variants from targeted sequencing, potentially providing diagnostic laboratories with the possibility to streamline diagnostic work-ups and reduce costs. Here, we analyze a large-scale cohort comprising 1,952 cardiomyopathy cases and 1,805 technically matched controls and show that a pangenome-based workflow, GRAF, conjugates higher precision and recall (F1 score 0.86) compared with conventional orthogonal methods (F1 0-0.57) in detecting potentially pathogenic [≥]20bp variants from short-read panel data. Our results indicate that pangenome-based workflows aid precise and cost-effective detection of large variants from targeted sequencing data in the clinical context. This will be particularly relevant for conditions in which these variants explain a high proportion of the disease burden.
Authors: Francesco Mazzarotto, Özem Kalay, Elif Arslan, Valeria Cinquina, Deniz Turgut, Rachel J Buchan, Mona Allouba, Valeria Bertini, Sarah Halawa, Pantazis Theotokis, Gungor Budak, Francesca Girolami, Petra Peldova, Jiri Bonaventura, Yasmine Aguib, Marina Colombi, Iacopo Olivotto, Massimo Gennarelli, Milan Macek, Elisabetta Pelo, Marco Ritelli, Magdi Yacoub, Paul JR Barton, H Serhat Tetikol, Roddy Walsh, James S Ware, Amit Jain
Last Update: 2024-12-02 00:00:00
Language: English
Source URL: https://www.medrxiv.org/content/10.1101/2024.11.27.24318059
Source PDF: https://www.medrxiv.org/content/10.1101/2024.11.27.24318059.full.pdf
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to medrxiv for use of its open access interoperability.