Simple Science

Cutting edge science explained simply

# Electrical Engineering and Systems Science# Machine Learning# Artificial Intelligence# Signal Processing

Advancements in Fish Composition Analysis Using Machine Learning

Machine learning improves fish biochemical analysis with Raman spectroscopy.

Yun Zhou, Gang Chen, Bing Xue, Mengjie Zhang, Jeremy S. Rooney, Kirill Lagutin, Andrew MacKenzie, Keith C. Gordon, Daniel P. Killeen

― 6 min read


Machine Learning BoostsMachine Learning BoostsFish Analysisbiochemical composition.New model enhances predictions in fish
Table of Contents

Analyzing the chemical makeup of fish is vital for the seafood industry, as it helps in the efficient extraction of valuable products. This analysis typically involves knowing the amounts of water, protein, and fat (lipids) in different fish species. However, figuring out these amounts has been challenging due to the variations based on where and when the fish were caught.

Raman Spectroscopy is a technique that can speed up and simplify this process. It allows scientists to look at the fish’s chemical composition without damaging them. By using computer programs that learn from data (machine learning), researchers can match the data from Raman spectroscopy with known biochemical information from fish. This study looks at how well different computer models can predict the water, protein, and fat contents in fish using this method.

The Importance of Fish Composition Analysis

In New Zealand, certain fish like Hoki and Mackerel are commonly caught but often turned into low-value products, such as fishmeal. By understanding the biochemical composition of these fish, we can potentially extract higher value products like omega-3 oil and protein. Knowing that fish typically consists of about 70-80% water, 10-20% protein, and 2-8% fat, helps in this analysis. However, these percentages can change significantly depending on various environmental factors, making the measurement process tricky.

Using Raman Spectroscopy

Raman spectroscopy, including techniques like Fourier Transform Raman and InGaAs Raman, is effective for analyzing the makeup of fish in a quick and non-destructive way. The data it provides can create distinct patterns that represent different biochemical components. Researchers aim to use this data to build a model that can predict these chemical contents accurately.

Challenges in Data Analysis

Traditional methods of analyzing Raman data often involve many complex steps. Past studies have used various simpler computer algorithms to make predictions but faced difficulties, as these models struggled to handle complex relationships in the data. Newer methods, specifically Convolutional Neural Networks (CNNs), show promise as they can automatically learn from data and find patterns more effectively.

Due to the specific nature of analyzing fish data, researchers faced challenges in acquiring a large number of samples. This limitation can lead to a problem known as overfitting, where a model learns too much from the limited data it has rather than generalizing well to new data. This study aimed to create a new CNN model designed specifically for the small datasets found in fish analysis.

The researchers developed a framework called FishCNN that combines data preparation and augmentation methods to tackle these challenges. This approach would help improve the computer model's performance and reliability.

Methodology of FishCNN

Data Collection

In the study, two types of Raman spectroscopy data were collected from fish samples. The researchers ensured that the data gathered was comprehensive enough to cover various aspects of the fish’s chemical properties. They worked with specific Raman techniques to reduce interference from other factors, such as the container the fish samples were placed in.

Data Preprocessing

To clean the data and prepare it for analysis, several methods were used to remove noise and improve the quality of the spectral signals obtained from the Raman techniques. The aims were to correct any background noise, remove distortion, and enhance the quality of the signals captured in the data.

Researchers devised a method to experiment with different preprocessing techniques to identify which combination produced the best results when used with the CNN model. The importance of selecting the right preprocessing steps cannot be overstated, as this set the foundation for the analysis.

Data Augmentation

Since the amount of data collected was relatively small, researchers looked into data augmentation methods to artificially increase the dataset size. This involved creating modified versions of the original data while ensuring that the quality of key features was preserved. By using these augmented datasets, they aimed to expose the CNN model to a wider range of variations during training, which helps improve the model's generalization.

CNN Architecture

The CNN model designed for this study had a unique structure. It included two layers that extracted features and two additional layers that further processed these features to make predictions about the biochemical content. The model used larger filters with smaller strides compared to traditional methods, allowing it to capture complex patterns in the data.

The goal was to create a system that could effectively analyze the Raman spectral data and predict the water, protein, and fat contents in fish samples accurately.

Experimental Design

Researchers put the FishCNN model through various tests to evaluate its performance. They compared it against traditional predictive models to see how well it performed in predicting the biochemical composition of fish using Raman spectroscopy data.

Data Evaluation

The dataset was divided into several parts for testing and training. Each section allowed the researchers to assess how well the model could predict biochemical contents based on the data it had learned from.

Researchers performed multiple runs to ensure that the results were consistent and statistically significant. They also made use of regularization techniques to further reduce the chances of the model overfitting to the limited dataset.

Results of the FishCNN Model

The FisherCNN model consistently outperformed the other traditional predictive models in evaluating the biochemical components of fish. The framework they built demonstrated an ability to achieve high accuracy levels, even with a small dataset.

Researchers found that even though the Raman data from InGaAs provided fewer features, it delivered better predictive power than the FT-Raman data. This indicated the effectiveness of the processing and modeling approach used in this study.

Predictions and Analysis

The predictions for the individual components-water, protein, and fat-were also analyzed. The CNN model performed consistently well across all aspects, demonstrating its overall reliability. However, predicting fat contents proved more challenging, underscoring the complexities involved in analyzing biochemical data.

Conclusion and Future Directions

This study illustrates a successful application of machine learning techniques to analyze complex spectral data from fish samples. The development of the FishCNN framework bridges a significant gap in the field, allowing for more accurate predictions of biochemical compositions with limited data.

Researchers discovered that careful data preprocessing followed by augmentation was essential in creating a robust model. Moving forward, there are plans to investigate more advanced machine learning techniques and explore their potential in enhancing the accuracy and interpretability of spectral data analysis in the seafood industry.

Future work may involve testing different models and methods to further refine the predictions and incorporate insights from other machine learning techniques. The door is now open for more research using CNNs in small datasets, offering new possibilities in marine biochemistry analysis, paving the way for better practices in the seafood industry.

Summary

In summary, the work presented in this study brings forth a solution for analyzing the biochemical composition in fish through Raman spectroscopy using machine learning techniques. The tailored CNN architecture, along with robust data preparation methods, enables researchers to overcome hurdles presented by small datasets, ensuring effective predictions of key biochemical components. The findings not only offer immediate implications for the seafood industry but also set the stage for future explorations in the realm of spectral analysis using machine learning.

Original Source

Title: Machine Learning for Raman Spectroscopy-based Cyber-Marine Fish Biochemical Composition Analysis

Abstract: The rapid and accurate detection of biochemical compositions in fish is a crucial real-world task that facilitates optimal utilization and extraction of high-value products in the seafood industry. Raman spectroscopy provides a promising solution for quickly and non-destructively analyzing the biochemical composition of fish by associating Raman spectra with biochemical reference data using machine learning regression models. This paper investigates different regression models to address this task and proposes a new design of Convolutional Neural Networks (CNNs) for jointly predicting water, protein, and lipids yield. To the best of our knowledge, we are the first to conduct a successful study employing CNNs to analyze the biochemical composition of fish based on a very small Raman spectroscopic dataset. Our approach combines a tailored CNN architecture with the comprehensive data preparation procedure, effectively mitigating the challenges posed by extreme data scarcity. The results demonstrate that our CNN can significantly outperform two state-of-the-art CNN models and multiple traditional machine learning models, paving the way for accurate and automated analysis of fish biochemical composition.

Authors: Yun Zhou, Gang Chen, Bing Xue, Mengjie Zhang, Jeremy S. Rooney, Kirill Lagutin, Andrew MacKenzie, Keith C. Gordon, Daniel P. Killeen

Last Update: Sep 29, 2024

Language: English

Source URL: https://arxiv.org/abs/2409.19688

Source PDF: https://arxiv.org/pdf/2409.19688

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

More from authors

Similar Articles