Simple Science

Cutting edge science explained simply

# Biology# Bioinformatics

New Program Aims to Improve Lipid Analysis

A novel model enhances accuracy in lipid classification and analysis.

― 5 min read


Lipid AnalysisLipid AnalysisBreakthroughaccuracy.New model improves lipid classification
Table of Contents

Lipids are important substances in our bodies. They take part in building cell membranes, sending signals, and storing energy. When our bodies do not process lipids correctly, it can lead to diseases. There are over 48,000 different lipids and similar molecules identified through a Classification program. Advanced techniques are now used to analyze lipids and understand their variety and amounts. By using methods like liquid chromatography and Mass Spectrometry, researchers can analyze hundreds to thousands of lipid molecules from just one sample.

Analyzing Lipids

Researchers use mass spectrometry techniques to study lipids. This involves breaking down lipid samples to create a pattern of ions. These ion patterns help identify the structure of lipids. Many software programs assist in analyzing these patterns to determine lipid structures. Some programs use specific rules to check for key ions that can help classify lipids into main groups and subcategories. However, current methods sometimes mislabel lipids due to noise from the equipment or overlapping signals. Therefore, better methods are needed to increase the accuracy of identifying lipid classes.

Advances in Analysis Software

In 2020, a new program called MS-DIAL was introduced. This program helps researchers analyze lipid samples more effectively. It has been used in many projects, analyzing over 16,000 biological samples. MS-DIAL classifies lipids by comparing them to standard reference values, allowing for more precise identifications. The software has provided names for over 82,000 lipid samples, but human review is still necessary to ensure accuracy. About half of the analyzed results are labeled with confidence, while the rest may be mixtures or misidentified.

Machine Learning in Lipid Analysis

To improve lipid classification, a new machine learning model called MS2Lipid was developed. This model uses accurately labeled spectral records from previous analyses. Unlike many studies that use standard spectra, this study uniquely employs data from biological samples. The new model aims to predict lipid subclasses more accurately by comparing its results to a benchmark program.

Data Collection for MS2Lipid

To create the MS2Lipid model, researchers analyzed over 16,600 samples from various projects using consistent methods. The data were processed through the MS-DIAL program to categorize multiple lipid subclasses. Experienced chemists manually reviewed the results to ensure quality. Ultimately, the researchers compiled data on thousands of unique lipids, allowing for effective machine learning training.

Validation of MS2Lipid

To test the accuracy of MS2Lipid, researchers obtained additional data from various projects. These data included information from different machines and curators. The model was assessed based on how well it classified these new lipid samples. A significant goal was to evaluate MS2Lipid's ability to generate accurate predictions, even when new machines and different analysis conditions were involved.

Creating the Machine Learning Model

With the collected data, researchers constructed the MS2Lipid model. They organized the data in a way that made it easy for the model to learn from it. Methods such as support vector machine, k-nearest neighbor, random forest, and deep neural networks were tested. Ultimately, the deep neural network performed the best, leading to the selection of this method for the final model.

Important Features in Lipid Classification

To improve the model's performance, researchers used a method called SHAP to identify key factors important for predicting lipid subclasses. This method helps explain which features of the data contributed most to the model's decisions. Some neutral loss values and mass measurements were highlighted as significant predictors.

Comparing MS2Lipid and Other Programs

The performance of the MS2Lipid model was compared against the CANOPUS program, which also predicts lipid identities. While both programs provide useful predictions, MS2Lipid showed higher accuracy, particularly when dealing with complex lipid mixtures. CANOPUS faced limitations and often did not provide results for many queries.

Evaluating the Model's Reliability

Researchers assessed how robust and scalable the MS2Lipid model was by testing it with spectra from various machines and analysts. Overall, it performed well with a high level of accuracy in classifying lipids. However, some differences in results were influenced by the type of machine or method used in the analysis.

Discovering Unknown Lipids

The MS2Lipid program was employed to analyze previously unseen lipid molecules in a human study. By reanalyzing complex lipid data sets, researchers were able to identify several new lipid structures. This included new forms of bile acid esters, which have significant implications for understanding lipid biology.

The Role of Lipids in Health

Overall, understanding lipids plays a critical role in human health. By identifying their structures and functions, researchers can better comprehend how they affect our bodies. This knowledge can pave the way for new insights into various health conditions related to lipid metabolism.

Conclusion

In summary, the development of the MS2Lipid program represents a significant advancement in the field of lipidomics. Its ability to accurately classify lipid subclasses is crucial in understanding the complex roles lipids play in our biology. As technology continues to evolve, so will our understanding of lipids, ultimately contributing to advancements in health and medicine. The future work will focus on refining this model further while aiming for more detailed lipid structural predictions.

Original Source

Title: MS2Lipid: a lipid subclass prediction program using machine learning and curated tandem mass spectral data

Abstract: Untargeted lipidomics using collision-induced dissociation-based tandem mass spectrometry (CID-MS/MS) is essential for biological and clinical applications. However, annotation confidence is still guaranteed by manual curation by analytical chemists, although various software tools have been developed for automatic spectral processing based on rule-based fragment annotations. In this study, we provide a novel machine learning model, MS2Lipid, for the prediction of lipid subclasses from MS/MS queries to provide an orthogonal decision of lipidomics software programs to determine the lipid subclass of ion features, in which a new descriptor, MCH (mode of carbon and hydrogen), was designed to increase the specificity of lipid subclasses in nominal mass resolution MS data. The model trained with 5,224 and 5,408 manually curated MS/MS spectra for the positive- and negative-ion modes mapped the query into one or several categories of 97 lipid subclasses, with an accuracy of 95.5% queries in the test set. Our program outperformed the CANOPUS ontology prediction program, providing correct annotations for 38.7% of the same test set. The program was further validated using various datasets from different machines and curators, and the average accuracy exceeded 87.4 %. Furthermore, the function of MS2Lipid was showcased by the annotation of novel esterified bile acids, whose abundance was significantly increased in obese patients in a human cohort study, suggesting that the machine learning model provides an independent criterion for lipid subclass classification, in addition to an environment for annotating lipid metabolites that have been previously unknown.

Authors: Hiroshi Tsugawa, N. Sakamoto, T. Oka, Y. Matsuzawa, K. Nishida, A. Hori, M. Arita

Last Update: 2024-05-18 00:00:00

Language: English

Source URL: https://www.biorxiv.org/content/10.1101/2024.05.16.594510

Source PDF: https://www.biorxiv.org/content/10.1101/2024.05.16.594510.full.pdf

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to biorxiv for use of its open access interoperability.

More from authors

Similar Articles