New Methods in Evolutionary Biology: Protein Structures vs. DNA Sequences
Researchers explore protein structures to understand evolutionary relationships more effectively.
Giacomo Mutti, Eduard Ocaña-Pallarès, Toni Gabaldón
― 5 min read
Table of Contents
In the world of biology, scientists are always trying to figure out how different living things are related to each other. This is like solving a giant family tree, but instead of your uncle and aunt, we're talking about all sorts of living creatures, from tiny bacteria to big blue whales. As researchers gather more information about the genomes of different species, they're able to ask more complicated questions about how life evolved over time. However, there are some challenges when it comes to using traditional methods to map out these relationships, especially when the organisms are very different from one another.
The Challenges of Traditional Methods
Traditional methods for studying the relationships between species often rely on looking at their DNA or protein sequences. These sequences can show how closely related different species are. But here's the catch: when the sequences are too different, it becomes hard to see a clear connection. It’s like trying to tell if two paintings are from the same artist when one is a modern piece and the other is an ancient masterpiece – tough, right?
To get around this issue, scientists started thinking outside of the box and considered using the shapes of proteins instead of just their sequences. The idea here is that Protein Structures might change more slowly than sequences, making them potentially better indicators of how different species are related. However, there aren’t that many protein structures available for large-scale studies, which has been a bit of a roadblock.
A Game Changer: AlphaFold2
Then came a game changer: AlphaFold2. This new tool has made it much easier to predict protein structures, and it has opened up new doors in the study of biology. Researchers are now able to look at protein structures across many different species at a scale that was previously thought impossible. With this advancement, new software tools have popped up, including Foldseek, which helps scientists align protein structures quickly and efficiently.
The Power of Protein Structures
With the advent of Foldseek and similar tools, researchers are looking to see how well they can use these protein structures to determine the relationships between human genes and the genes of other species. By doing this, they aim to create a ‘human phylome,’ which is kind of like a family tree that shows all the different versions of genes in humans and how they relate to genes in other living things.
The process starts with selecting a bunch of different eukaryotic species, which are organisms with complex cells. Researchers gather protein structures from a special database and make sure the structures are reliable by getting rid of any that don’t meet a certain quality standard. After that, they compare the human proteins to those from selected species using both traditional sequence-based methods and the new structure-based methods.
Understanding the Results
When looking for matches, researchers used two main methods: BLASTP, which is the traditional sequence comparison tool, and Foldseek, which compares protein structures. Surprisingly, they found that not all matches were the same. Only a small portion of the pairs found by both methods were the same, indicating a big difference between how the two approaches work. While BlastP found many unique matches, Foldseek also uncovered some interesting connections in the so-called ‘twilight zone’ of genetic similarity, where sequences are too different to easily compare.
This research showed that while Foldseek might be great at identifying distant relatives that sequences miss, it also seems to overlook some established relationships that BlastP easily finds. It’s like finding a long-lost cousin in a distant part of the world while realizing you missed the family reunion down the street.
Which Method Works Better?
Now that researchers had both methods in play, they needed to see which one provided better results when building family trees, known as phylogenies. They used various ways to assess how accurate their trees were, looking at things like how closely they aligned with known species groupings.
As it turns out, the sequence-based methods consistently performed better than the structure-based ones in almost all tested scenarios. This suggests that while protein structures have their uses, DNA sequences still hold the key to accurately tracing evolutionary relationships.
The Takeaway
So, what’s the bottom line? While using protein structures offers new insights into evolution, it doesn’t quite outshine the traditional methods just yet. The exploration into protein structures is exciting and has opened up new avenues for research, but it seems that there’s still room for improvement in structure-based methods.
A Final Funny Thought
If proteins were people at a party, some would be great at mingling and making connections (like BlastP). Others might be a bit quirky and only connect with distant relatives (hello, Foldseek). But together, they could throw one fantastic evolutionary reunion if they play their cards right!
Original Source
Title: Newly developed structure-based methods do not outperform standard sequence-based methods for large-scale phylogenomics
Abstract: Recent developments in protein structure prediction have allowed the use of this previously limited source of information at genome-wide scales. It has been proposed that the use of structural information may offer advantages over sequences in phylogenetic reconstruction, due to their slower rate of evolution and direct correlation to function. Here, we examined how recently developed methods for structure-based homology search and tree reconstruction compare to current state-of-the-art sequence-based methods in reconstructing genome-wide collections of gene phylogenies (i.e. phylomes). While structure-based methods can be useful in specific scenarios, we found that their current performance does not justify using the newly developed structured-based methods as a default choice in large-scale phylogenetic studies. On the one hand, the best performing sequence-based tree reconstruction methods still outperform structure-based methods for this task. On the other hand, structure-based homology detection methods provide larger lists of candidate homologs, as previously reported. However, this comes at the expense of missing hits identified by sequence-based methods, as well as providing homolog candidate sets with higher fractions of false positives. These insights help guide the use of structural data in comparative genomics and highlight the need to continue improving structure-based approaches. Our pipeline is fully reproducible and has been implemented in a snakemake workflow. This will facilitate a continuous assessment of future improvements of structure-based tools in the Alphafold era.
Authors: Giacomo Mutti, Eduard Ocaña-Pallarès, Toni Gabaldón
Last Update: 2024-12-20 00:00:00
Language: English
Source URL: https://www.biorxiv.org/content/10.1101/2024.08.02.606352
Source PDF: https://www.biorxiv.org/content/10.1101/2024.08.02.606352.full.pdf
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to biorxiv for use of its open access interoperability.