Simple Science

Cutting edge science explained simply

# Electrical Engineering and Systems Science# Image and Video Processing# Computer Vision and Pattern Recognition

Introducing the NuInsSeg Dataset for Nuclei Segmentation

A new dataset to improve automated nuclei segmentation in tissue images.

― 5 min read


NuInsSeg Dataset LaunchNuInsSeg Dataset Launchaccuracy.New dataset boosts nuclei segmentation
Table of Contents

In the field of medical research, analyzing images of tissues is very important. One key task is to identify and outline the nuclei, which are part of the cells. This process is known as nuclei instance segmentation. As technology advances, more automated methods are being used to analyze these images quickly and accurately. However, to train these computer systems, we need a lot of example images with detailed annotations showing where each nucleus is located.

The Need for Annotated Datasets

Creating datasets that contain images of tissues with properly marked nuclei can be difficult, especially in medicine. Many researchers rely on deep learning methods, which are a type of artificial intelligence, to perform nuclei segmentation. These methods have proven to be more effective than older techniques. But they require large amounts of fully annotated data to perform well.

The NuInsSeg Dataset

To address the need for annotated datasets, we present the NuInsSeg dataset. This is one of the largest datasets available that contains images of tissues stained with Hematoxylin And Eosin (H E). This dataset includes 665 image patches with over 30,000 annotated nuclei collected from 31 different human and mouse organs. In addition to this, we have added masks for areas where it is hard to provide exact annotations, even for experts. This extra information can help researchers better understand the challenges in nuclei segmentation.

Background on Nuclei Segmentation

The technology used to scan and digitize tissue images is continually improving. This has led to more interest in using computer methods for analyzing complete slide images. Nuclei segmentation is a core part of understanding these images, as it helps in identifying critical features related to cells and tissues. Factors like the number of nuclei in an area or the size of the nuclei compared to the surrounding cell material can influence diagnoses, such as determining cancer severity.

Experts who manually annotate these images face significant challenges. They must carefully identify and outline thousands of nuclei in even small image patches. Factors like tissue folds, out-of-focus areas, and differences in staining can make it difficult to achieve precise annotations. Studies have shown that different experts may not agree on how to annotate the same image, indicating the inherent challenges in the process.

Existing Datasets and Their Limitations

There are several datasets available for nuclei segmentation, especially for H E-stained images, which are the most common type used in pathology. While these datasets have contributed positively to research, there remains a strong need for additional datasets that cover a wider variety of tissues. This additional variation can help researchers create better algorithms for segmentation.

Some datasets have been created using semi-automatic methods, involving trained computer systems to assist in the annotation process. However, these methods can introduce biases based on the models used to generate them. For this reason, we focus on fully manual annotations in our dataset to avoid these potential issues.

Details of the NuInsSeg Dataset

The NuInsSeg dataset includes brightfield images and was developed using tissue samples from various human and mouse organs. The images were collected by scanning stained tissue sections using advanced imaging technology. Instead of whole slide images, we worked with individual fields of view, which were carefully selected to represent each type of tissue.

In total, 665 image patches were created. The segmentation process was carried out by trained individuals who accurately outlined the nuclei for each image. We avoided using quick semi-automatic methods for annotation to ensure that the segmentations closely resembled those that would be provided by human experts.

Along with the main annotated images, we also created several types of additional segmentation masks. These include binary masks, which indicate where nuclei are, and other auxiliary masks that could be useful for advanced computer-based segmentation approaches. For the first time, we also annotated ambiguous areas in the images where identifying nuclei is particularly difficult. This added information can be very helpful for researchers analyzing the performance of Segmentation Algorithms.

The Importance of Technical Validation

To evaluate our dataset and ensure its reliability, we split it into several parts for testing and training models. By doing this, we can develop a baseline for comparing different segmentation algorithms. Various deep learning models were tested using our dataset to see how well they performed in identifying and segmenting nuclei.

We used different performance metrics to measure how well the models worked. For example, we looked at scores that show how accurately the models matched the manual annotations. The results showed that one specific model was particularly effective at segmenting the nuclei.

Making the Dataset Accessible

The NuInsSeg dataset is made available for public use. This is important because it allows researchers to download and use the images and annotations for their own studies. Having access to this dataset means that anyone working in the field can develop and test their segmentation algorithms without having to create their own datasets from scratch.

Researchers can utilize the dataset in various ways. They can train new models, validate existing ones, or even perform analysis to better understand the characteristics of the tissues they are studying. The dataset will serve as a valuable resource in the ongoing research to improve nuclei segmentation methods.

Conclusion

The NuInsSeg dataset represents a substantial contribution to the field of computational pathology. By providing a large collection of fully annotated images of nuclei in H E-stained histological tissues, we are helping to advance the development of more accurate and efficient segmentation algorithms. This dataset not only addresses a significant need in research but also opens the door for further studies and enhancements in the analysis of tissue images.

The challenges faced in this field are many, but with resources like the NuInsSeg dataset, researchers can work towards creating better automated tools for analyzing tissue images. The information provided through this dataset will help to refine the understanding of nuclei segmentation, ultimately aiding in the diagnosis and treatment of diseases through improved image analysis techniques.

Original Source

Title: NuInsSeg: A Fully Annotated Dataset for Nuclei Instance Segmentation in H&E-Stained Histological Images

Abstract: In computational pathology, automatic nuclei instance segmentation plays an essential role in whole slide image analysis. While many computerized approaches have been proposed for this task, supervised deep learning (DL) methods have shown superior segmentation performances compared to classical machine learning and image processing techniques. However, these models need fully annotated datasets for training which is challenging to acquire, especially in the medical domain. In this work, we release one of the biggest fully manually annotated datasets of nuclei in Hematoxylin and Eosin (H&E)-stained histological images, called NuInsSeg. This dataset contains 665 image patches with more than 30,000 manually segmented nuclei from 31 human and mouse organs. Moreover, for the first time, we provide additional ambiguous area masks for the entire dataset. These vague areas represent the parts of the images where precise and deterministic manual annotations are impossible, even for human experts. The dataset and detailed step-by-step instructions to generate related segmentation masks are publicly available at https://www.kaggle.com/datasets/ipateam/nuinsseg and https://github.com/masih4/NuInsSeg, respectively.

Authors: Amirreza Mahbod, Christine Polak, Katharina Feldmann, Rumsha Khan, Katharina Gelles, Georg Dorffner, Ramona Woitek, Sepideh Hatamikia, Isabella Ellinger

Last Update: 2023-08-03 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2308.01760

Source PDF: https://arxiv.org/pdf/2308.01760

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

More from authors

Similar Articles