Simple Science

Cutting edge science explained simply

# Quantitative Biology # Biomolecules # Populations and Evolution

The Pathways of Protein Evolution

Discover how proteins evolve through tiny changes and connections.

Pranav Kantroo, Günter P. Wagner, Benjamin B. Machta

― 6 min read


Protein Evolution Protein Evolution Pathways mutations and connections. Mapping protein transformations through
Table of Contents

Proteins are the workhorses of our cells. They do a lot of important jobs like breaking down food, sending signals, and giving cells their shape. Made from chains of Amino Acids, proteins take on unique shapes that determine what they can do. Think of them like complex machines; if you change one part, you might get a different outcome.

The Mystery of Protein Evolution

Now, imagine evolution as a long, twisty road full of detours. Over time, proteins can change their sequences through tiny Mutations, which are like small speed bumps along the way. Some proteins may end up looking and acting very differently after taking many detours, while others may still look pretty similar, even if they have taken a long route.

But here’s the catch: how do scientists figure out how these proteins are related, especially if they look and act like distant cousins? That’s where things get tricky!

The Language of Protein Sequences

Just like every language has its own grammar, proteins have a way they “speak” through their sequences. Scientists have developed tools to understand this language. One popular tool is a protein language model, which helps guess how a protein should fold up and function based on its amino acid sequence.

This model essentially acts as a translator, trying to figure out how different sequences relate to one another. So if you have two proteins that seem quite different, this model looks for connections that might not be obvious at first glance.

Creating Paths Between Proteins

Here’s a fun idea: what if we could create paths between proteins, like connecting dots between two points? That’s exactly what scientists are trying to do! By making small changes, like swapping out one amino acid for another, it’s possible to see how a protein could theoretically evolve from one form to another.

Imagine playing a game where you change one letter in a word to create another word – but with proteins! Instead of words, you have amino acid sequences, and the aim is to keep creating viable mutations until you reach the desired protein.

The Protein Road Trip

Let’s take a road trip analogy. If you start from one protein (let’s call it Protein A) and want to reach Protein B, you can only make minor adjustments (or mutations) along the way. Every time you make a switch, you want to ensure that your new protein is still functional, like making sure the car doesn’t break down in the middle of your road trip.

To keep this road trip interesting, scientists invented something called a “beam search.” This is like using a GPS that doesn’t just point you to the quickest route but also checks if your car (or protein) is still capable of running smoothly at every stop. It helps in finding the best possible paths through the protein landscape.

Outright Homologs: Close Relatives

Let’s start with proteins known as outright homologs. These are like siblings that share a significant amount of characteristics, similar to having the same parents. For example, enzymes that help bacteria resist antibiotics share many similarities in their sequences.

When scientists tried to connect these sibling proteins along various paths, they found that as long as the mutations were minor and consistent, the path remained functional. It’s like driving from one family reunion to another through familiar neighborhoods without hitting any dead ends.

Distant Homologs: More Complicated Family Trees

Now let’s spice things up with distant homologs. These proteins might be like second or third cousins who have changed a lot over time but still share some family traits.

One such family includes proteins that help bacteria respond to different environments. Despite being a bit more diverse, researchers found that it’s still possible to find connections between these distant relatives. The paths may lead through some weird or unstable states, like taking a detour through an unfamiliar area.

Speculative Homologs: The Mystery Cousins

Next on our protein journey are the speculative homologs. These proteins are like far-off cousins that don't look much alike. They serve different purposes, but there are quirky similarities that suggest they may have a common ancestor.

For instance, lactate dehydrogenase and NADH peroxidase are two different proteins that do very different jobs. However, some parts of their structure hold onto ghostly similarities. Scientists have found paths between these proteins that take some unexpected twists and turns, often going through unstable states that make you wonder if you’ve lost your way.

The Challenges of Finding Paths

Finding paths between proteins isn’t as straightforward as it seems. You can think of it as trying to connect two old bridges that have crumbled over time. While it’s easy to find paths between similar proteins (like siblings), distant relatives or mysterious cousins often require navigating through uncertain spaces.

Sometimes, when traversing these paths, proteins may lose their stable structure or functionality. It’s like a car that can’t handle a rough road and breaks down. When looking for these paths, scientists gather a lot of data to ensure they’re still moving in the right direction.

Measuring Fitness along the Path

As scientists create protein paths, they also need a way to measure how “fit” each variation is. A popular method for this is using ESM2 language models to predict how likely a given protein is to function effectively.

Imagine if you had a fitness tracker that tells you how well you’re doing on your road trip. If you hit a few bumpy patches, your tracker might beep to warn you that it’s time to change routes before the car (or protein) gives up.

Comparing Different Paths

To ensure paths are helpful, scientists often compare them. The paths created through the beam search method tend to hold up better than random paths.

In a fun twist, using random paths can lead to some surprising discoveries, but they often just lead to dead ends-or worse-breakdowns! By keeping track of how “fit” the proteins remain throughout their journey, scientists can refine their techniques to create better paths.

Lessons Learned

In creating viable paths between protein sequences, scientists discovered quite a bit about how evolution works. They found that while it’s possible to connect outright homologs with ease, distant homologs and speculative homologs can take a little more finesse.

The beauty of this work lies in understanding evolution’s random nature, where sometimes, you might stumble upon a hidden connection that changes everything.

Conclusion: The Road Ahead

As scientists continue to map the journey of proteins, they are not just finding paths but also unlocking secrets about how life adapts. This knowledge can help us predict how proteins might evolve in the future, potentially leading to new innovations in medicine and biotechnology.

So, the next time you think about evolution, just remember: it’s a winding road full of unexpected twists, turns, and the occasional detour that leads to a beautiful new destination!

Original Source

Title: High fitness paths can connect proteins with low sequence overlap

Abstract: The structure and function of a protein are determined by its amino acid sequence. While random mutations change a protein's sequence, evolutionary forces shape its structural fold and biological activity. Studies have shown that neutral networks can connect a local region of sequence space by single residue mutations that preserve viability. However, the larger-scale connectedness of protein morphospace remains poorly understood. Recent advances in artificial intelligence have enabled us to computationally predict a protein's structure and quantify its functional plausibility. Here we build on these tools to develop an algorithm that generates viable paths between distantly related extant protein pairs. The intermediate sequences in these paths differ by single residue changes over subsequent steps - substitutions, insertions and deletions are admissible moves. Their fitness is evaluated using the protein language model ESM2, and maintained as high as possible subject to the constraints of the traversal. We document the qualitative variation across paths generated between progressively divergent protein pairs, some of which do not even acquire the same structural fold. The ease of interpolating between two sequences could be used as a proxy for the likelihood of homology between them.

Authors: Pranav Kantroo, Günter P. Wagner, Benjamin B. Machta

Last Update: 2024-11-13 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2411.09054

Source PDF: https://arxiv.org/pdf/2411.09054

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

Similar Articles