Sci Simple

New Science Research Articles Everyday

# Biology # Bioinformatics

Advancements in Protein Structure Prediction

New models improve how scientists predict protein shapes and interactions.

Toshiyuki Oda

― 6 min read


Protein Prediction Protein Prediction Breakthrough protein structure analysis. New techniques push boundaries in
Table of Contents

Protein structure prediction is a significant and complex challenge in biology. Proteins are vital for nearly all biological processes, and their functions depend heavily on their shapes. This can be likened to how a key fits into a lock; if the shape is not just right, nothing works. For years, scientists have sought methods to predict these intricate shapes without needing to physically see them, which can be a time-consuming and expensive endeavor.

The Role of AlphaFold and Its Successor

A notable advancement in this field is a system known as AlphaFold. Developed by a team aiming to crack the code of protein shapes, AlphaFold managed to make remarkable strides in predicting how proteins fold into their functional forms. In a competition called CASP14, AlphaFold outperformed its rivals by accurately modeling a large number of protein domains, showcasing its potential to revolutionize protein structure prediction.

Following the success of AlphaFold, a new version called AlphaFold-Multimer was released, which extended the ability of AlphaFold to predict how multiple proteins interact with each other, which is crucial since many proteins do not operate in isolation. It turns out, predicting how proteins come together, like pieces of a puzzle, is more complex than predicting a single protein's shape.

The Challenges in Multimer Predictions

Despite the impressive performance of AlphaFold-Multimer, there is still room for improvement, especially when it comes to predicting the Structures of multiple proteins, known as Multimers. Although it has been successful with some multimer structures, the accuracy drops significantly for certain types, particularly in Immune-related proteins.

Researchers have noted that this challenge appears to stem from a couple of issues. First, current methods often rely on co-evolution information, which means they look at how different proteins have evolved alongside one another. To use this information, scientists must find the correct sequence pairs, which can be tricky. Many proteins have similar versions, called paralogs, and sorting these out is no easy feat.

Second, the methods used to analyze protein sequences often incorporate data from closely related proteins. This can be helpful because similar proteins tend to have similar structures. However, in the case of unique regions, like those found in immune system proteins, the reliance on evolutionary similarities can lead to inaccuracies.

To address these challenges, researchers have considered moving away from traditional sequence-based methods and sought new approaches to enhance predictions.

A New Approach: AFM-Refine-G

Enter AFM-Refine-G, a fine-tuned version of AlphaFold-Multimer. This system was developed to take the Predicted protein structures and make them even better. Instead of using traditional sequence tools, AFM-Refine-G focuses on refining structures based on their physical properties. It relies on the protein's predicted shape and seeks to enhance it, much like polishing a diamond to make it shine brighter.

This new approach was tested on various datasets of protein structures. The idea was to use the predicted forms of proteins as a starting point and then fine-tune these shapes to achieve a more accurate representation of the actual protein structures. This involved selecting structures that were likely to interact well and focusing on improving these interactions during the refinement process.

Training and Testing the Model

Training AFM-Refine-G involved a meticulous process to ensure it could effectively refine protein structures. Initial predictions were generated using the AlphaFold-Multimer, and these predicted structures were then fed into AFM-Refine-G for further enhancement.

The system evaluated the quality of each refined structure through various metrics. Researchers looked at how well the refined structures compared to the original predictions and the actual experimental data. They used multiple datasets for testing, allowing them to assess the model's performance comprehensively.

Interestingly, the results were mixed. For some datasets, AFM-Refine-G significantly improved the predicted structures, while for others, the results were less favorable. This inconsistency suggested that the model might be biased toward certain types of protein structures.

Analyzing Results and Areas for Improvement

Upon analysis, it became clear that certain structures were improved while others faced challenges. In particular, multimeric structures associated with immune responses often ended up in the "Incorrect" category when evaluated against established criteria. This indicated that further enhancements were needed, particularly for these tricky proteins.

Additionally, it was noticed that the connection between how well a model predicted a structure and the confidence level assigned to that prediction could be misleading. Sometimes, a structure might look good on paper, but in reality, it could have major flaws, like atoms clashing with one another in an unwanted way.

The researchers speculated that this inconsistency could stem from how the model was trained. Since AFM-Refine-G was developed with a focus on more "normal" structures, it struggled with unconventional shapes, particularly those related to the immune system.

Keeping Up with Advancements

As science progresses, so do the tools available for researchers. After the creation of AFM-Refine-G, a newer version of AlphaFold-Multimer was introduced. This updated version built upon the successes and lessons learned from previous models. It utilized new training methods and larger datasets, increasing the chances of better predictions.

To assess how well AFM-Refine-G performed against these new models, researchers tested it again on more recent challenges. They aimed to see if AFM-Refine-G could still provide value even when faced with the latest advancements in protein structure prediction.

The Future of Protein Structure Prediction

The journey of protein structure prediction is far from over. While new models like AFM-Refine-G have shown promise, the landscape of biology is continually changing. The tools and methods will need to evolve to keep up with increasingly complex protein interactions, especially those involved in diseases.

In conclusion, while it might be a challenging field with a lot more puzzles to solve, the ongoing work in protein structure prediction is helping scientists unlock new doors in biology. As researchers refine their approaches and develop better models, we can expect exciting breakthroughs. With each piece of the puzzle that falls into place, our understanding of the intricate world of proteins will deepen, paving the way for new discoveries in medicine and beyond.

So, here's to the scientists and their perseverance! After all, in the world of protein prediction, they are the heroes navigating a labyrinth, holding the key to countless biological mysteries. Who knew that studying tiny molecules could lead to such large discoveries?

Original Source

Title: Refinement of AlphaFold-Multimer structures with single sequence input

Abstract: AlphaFold2, introduced by DeepMind in CASP14, demonstrated outstanding performance in predicting protein monomer structures. It could model more than 90% of targets with high accuracy, and so the next step would surely be multimer predictions, since many proteins do not act by themselves but with their binding partners. After the publication of AlphaFold2, DeepMind published AlphaFold-Multimer, which showed excellent performance in predicting multimeric structures. However, its accuracy still has room for improvement compared to that of monomer predictions by AlphaFold2. In this paper, we introduce a fine-tuned version of AlphaFold-Multimer, named AFM-Refine-G, which uses structures predicted by AlphaFold-Multimer as inputs and produces refined structures without the help of multiple sequence alignments or templates. The performance of AFM-Refine-G was assessed using four datasets: Ghani_et_al_Benchmark2 and Yin_et_al_Hard using AlphaFold-Multimer version 2.2 outputs, and CASP15_multimer and Yin_and_Pierce_af23 using AlphaFold-Multimer version 2.3 outputs. Of 1925 predicted structures, 203 had DockQ improvement > 0.05 after refinement, demonstrating that our model is useful for the refinement of multimer structures. However, considering the per target success rate, the overall improvement was modest, suggesting that the original AlphaFold-Multimer network had already learned a biophysical energy function independent of MSAs or templates, as proposed by Roney and Ovchinnikov (Roney and Ovchinnikov, 2022). Furthermore, both the default AlphaFold-Multimer and our refinement model showed lower performance for immune-related targets compared to general targets, indicating that room for improvement remains. AvailabilityThe inference scripts are available from https://github.com/t-oda-ic/afm_refiner under the Apache License, Version 2.0. The network parameters are available from https://figshare.com/articles/online_resource/afm_refine_g_20230110_zip/21856407 under the license CC BY 4.0.

Authors: Toshiyuki Oda

Last Update: 2024-12-26 00:00:00

Language: English

Source URL: https://www.biorxiv.org/content/10.1101/2022.12.27.521991

Source PDF: https://www.biorxiv.org/content/10.1101/2022.12.27.521991.full.pdf

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to biorxiv for use of its open access interoperability.

More from author

Similar Articles