Simple Science

Cutting edge science explained simply

# Biology # Bioinformatics

Segmentation in Biomedical Imaging: A Necessary Step for Discovery

Segmentation helps scientists analyze biomedical images for better health insights.

Elaine ML Ho, Dimitrios Ladakis, Mark Basham, Michele C Darrow

― 8 min read


Segmentation in Segmentation in Biomedical Imaging data collaboration. Advancing research through effective
Table of Contents

Segmentation is a key step in understanding images taken from various biomedical imaging techniques. Think of it as coloring in a coloring book, but instead of colors, researchers draw lines around specific shapes in 3D images. By doing this, scientists can learn more about cells, tissues, and even the effects of diseases.

The Importance of Segmentation

In the world of science, especially in health research, finding the right information is crucial. Segmentation plays a vital role in answering questions. It helps scientists break down what they see in images into smaller pieces that can be examined closely. For instance, researchers have utilized segmentation to analyze blood-nerve barriers, vascular structures in horse placentas, and the impacts of viral infections.

It’s everywhere! From electron microscopy to X-rays, segmentation is a necessity. Until not too long ago, scientists had to do this work by hand, tracing shapes slice by slice through an image. It was a labor-intensive task, and sometimes it felt like trying to find a needle in a haystack.

Technology Comes to the Rescue

With advancements in technology, especially machine learning, the process of segmentation has taken a big leap forward. Researchers can now use computers to help with the task, making it faster and more efficient. However, it’s not a complete transformation just yet. Manual effort is still needed to ensure accuracy, which means a lot of scientists are still putting in their hours, balancing between machines and their own expertise.

The Need for Data Repositories

When scientists create high-quality Segmentations, it’s crucial to share this data so others can use it. However, there are not many reliable places to store and access these segmentations. EMPIAR, for example, is a popular database for electron microscopy data. Yet, despite hosting a massive amount of data, it faces challenges like inconsistent information about the datasets. It’s like trying to find a book in a library where some titles are incorrectly labeled.

Other repositories exist, but they may not be widely known, leading to more headaches for researchers trying to share their findings. There are cases where segmentation data is only accessible through specific requests or is hidden behind a maze of complex links. It's a bit like trying to find buried treasure with an outdated map!

Evaluating Available Segmentation Data

Researchers recently took a look at the segmentation data available to the public. They searched through various databases and publications to figure out what’s out there, how it’s being used, and what barriers prevent it from being fully utilized. They focused on studies from 2014 to 2024, gathering information on the types of segmentation being produced and where the data ends up.

Some key points they looked into included:

  • Type of Study: What was the focus? Was it Biological, methodological, or about software?
  • Purpose of Segmentation: Was it for pretty pictures, analysis, or to show new techniques?
  • Where Data Was Stored: Were the images and segmentations deposited in relevant places?
  • Imaging Technique Used: What tools were used to obtain the images?
  • File Types: What formats are the data in? Are they easy to open?
  • Source of Data: Was the data created for this study or borrowed from somewhere else?
  • Segmentation Method: Was it done manually, with some automation, or entirely automated?
  • Biological Scale: What kind of biological features were being examined?

Challenges in Data Reusability

Despite the effort to gather data, several challenges make it tough for researchers to reuse segmentation data. If data is hard to find or access, it becomes useless. Among the studies reviewed, a substantial proportion of data was either missing, not deposited, or hard to track down. For example, almost 76% of training data was unavailable for the studies needing it.

Scientists often want to build their work on the foundation of previous studies. However, if the necessary data isn’t easy to find, it slows down research progress. Think of it like trying to bake a cake without the key ingredient—good luck with that!

Inconsistent Data Formats Create Headaches

Another big issue lies in the variety of file formats used for storing data. The researchers found that data was stored in 26 different formats! This diversity makes it difficult for scientists to join forces or combine data from different studies. It’s like trying to fit a square peg in a round hole!

Even the Metadata—the information describing the data—wasn't standardized across different databases. This inconsistency further complicates matters when scientists try to integrate data from various sources. In the worst cases, some terms had entirely different meanings across fields, leading to confusion.

Low Rates of Data Reuse

One eye-opening finding was just how low the rates of data reuse were. Of all the publications, only a small fraction reused existing data. Many scientists still preferred to collect their own data rather than searching through archives. This might be due to various reasons, such as the difficulty of finding data or simply a lack of awareness about what is available.

When scientists explored data reuse within specific fields—like connectomics—they found that a good number of studies successfully reused data. However, even within this niche, challenges remained in finding quality data.

Differences in Imaging Techniques

The study also highlighted differences based on imaging techniques. Some approaches had higher rates of data reuse than others. For example, X-ray micro-computed tomography had a notably low reuse rate, whereas room-temperature electron microscopy scored higher, thanks to more data being shared in public challenges.

Each technique has its quirks, and these quirks can impact the availability and usability of data. The key, however, remains the same: improving how data is shared and making it easier for researchers to find and use.

The Need for Clearer Terminology

In the bioimaging field, some common terms can cause confusion. Words like "reconstruction," "mask," and "segmentation" might seem straightforward, but they can mean different things in different contexts. This confusion can lead to misinterpretation.

For example, when researchers say "segmentation," they usually refer to identifying different parts of an image. However, in some cases, it has been used to describe putting an average object back into an image. This can cause the actual meaning to get lost in translation, especially for less experienced researchers.

Improving Metadata for Better Understanding

A significant part of making datasets easier to use lies in enhancing metadata. Metadata helps explain what’s in a dataset. The researchers pointed out that segmentation data needs better metadata to truly understand its purpose and quality. Simple details about what it is and how it was created would go a long way!

For instance, knowing what kind of biological feature was being looked at and how accurate the segmentation is would be beneficial. Enhanced search capabilities and better metadata could help researchers find the right datasets that fit their needs more efficiently.

Recommendations for Researchers

To make things better, the research community needs to take action at various levels. Here are some simple steps:

  1. Share Your Data: When researchers have valuable data, it’s essential to deposit it in a proper repository. This includes image data, training data, labels, and code.
  2. Choose the Right Repository: Select databases that provide permanent links to data. Avoid temporary or personal sites that may not last.
  3. Be Clear: When writing about research, descriptions of the data need to be clear and precise, so future users know what to expect.
  4. Encourage Standards: Everyone involved in research should work together to ensure consistent file formats, descriptions, and metadata. It might be a tough puzzle to solve, but everyone loves a challenge, right?
  5. Support Public Challenges: These challenges are essential for advancing the field, and they should be celebrated and encouraged.

The Role of Repositories

Repositories also have a role to play in this improvement effort. They should provide tools that make it easier for scientists to search for, access, and upload their data. Adopting standardized and user-friendly file formats could help researchers save time and resources.

The Future of Data Sharing

There’s a strong need for change in how segmentation data is deposited and reused. Good data sharing practices will help the entire research community, especially those developing new segmentation tools that may rely on large sets of data.

With clearer descriptions, streamlined processes, and shared goals, the bioimaging community can ensure that valuable data doesn’t go to waste. By working together, researchers can set the stage for the next wave of discoveries in biomedical imaging.

Conclusion

In summary, segmentation is an essential step in evaluating biomedical images, allowing scientists to draw important conclusions from their data. The transition to more automated processes is promising, but manual input remains vital. Furthermore, a push for better data sharing practices and standardized metadata can bridge the gap between what researchers currently have and what they need for advancements in this field.

Just like in a big family, everyone needs to pitch in for a smooth-running household. If researchers collaborate and share their data more freely, the future of biomedical imaging will surely shine brighter!

Original Source

Title: Depositing biological segmentation datasets FAIRly

Abstract: Segmentation of biological images identifies regions of an image which correspond to specific features of interest, which can be analysed quantitatively to answer biological questions. This task has long been a barrier to conducting large-scale biological imaging studies as it is time- and labour-intensive. Modern artificial intelligence segmentation tools can automate this process, but require high quality segmentation data for training, which is challenging to acquire. Biological segmentation data has been produced for many years, but this data is not often reused to develop new tools as it is hard to find, access, and use. Recent disparate efforts (Iudin, et al., 2023; Xu, et al., 2021; Vogelstein, et al., 2018; Ermel, et al., 2024) have been made to facilitate deposition and re-use of these valuable datasets, but more work is needed to increase re-usability. In this work, we review the current state of publicly available annotation and segmentation datasets and make specific recommendations to increase re-usability following FAIR (findable, accessible, interoperable, re-usable) principles (Wilkinson, et al., 2016) for the future.

Authors: Elaine ML Ho, Dimitrios Ladakis, Mark Basham, Michele C Darrow

Last Update: Dec 12, 2024

Language: English

Source URL: https://www.biorxiv.org/content/10.1101/2024.12.10.627814

Source PDF: https://www.biorxiv.org/content/10.1101/2024.12.10.627814.full.pdf

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to biorxiv for use of its open access interoperability.

Similar Articles