Improving Medical Image Classification with Active Label Cleaning

Table of Contents

The Problem of Noisy Labels
Importance of Clean Labels
Active Label Cleaning Approach
Addressing Class Imbalance
Datasets Used
Experiments and Results
Conclusion
Original Source
Reference Links

Medical image classification can greatly help diagnose diseases. However, there is often a problem with incorrect labels, which can make it challenging to train accurate models. This is especially true when some diseases are rare and have fewer images. In this context, having Noisy Labels, or incorrect labels, can lead to a drop in model performance. This article discusses a method that aims to improve the training of classifiers in the presence of noisy labels and imbalanced Datasets.

The Problem of Noisy Labels

In the real world, many factors can lead to noisy labels in medical images. Poor quality annotations, automated label generation, or even relying on misleading labels can introduce errors. This noise can distort the learning process, where a model tries to fit the training data, and this distortion can reduce its ability to perform well on new, unseen data.

In medical datasets, conditions can vary in how common they are. Some diseases have many images available, while others have far fewer. For instance, a rare skin condition might have only a small number of images in the dataset, making it hard for the model to learn about it effectively. When working with such imbalanced data, traditional methods that rely on noisy labels might struggle to recognize the minority classes correctly.

Importance of Clean Labels

For accurate predictions, obtaining clean labels is crucial. A clean label is simply a correct label that accurately describes an image. If the model is trained with noisy labels, it might misclassify important images, especially those from minority classes. This means that special strategies are needed to identify and clean these labels, allowing the model to improve its performance gradually.

Active Label Cleaning Approach

To tackle the issue of noisy labels, a two-phase approach is recommended. The first phase focuses on robust training, even when faced with noisy labels. The second phase involves actively cleaning these labels. By combining these two stages, the method can improve classification performance significantly.

Phase 1: Learning with Noisy Labels

In the initial phase, the model is trained while considering the noise present in the labels. The idea is to learn which samples are likely to be clean and which are noisy. This involves separating the labels based on their reliability. However, standard methods often fall short when dealing with imbalanced datasets since they may identify underrepresented samples as noisy mistakenly.

Phase 2: Active Label Cleaning

After the first phase, the next step is to clean the noisy labels. An annotation budget is set that limits how many samples can be relabeled. An active learning sampler is then used to select the most crucial samples to clean. By focusing on key samples during the relabeling process, the model can improve significantly. The selected samples are then sent to experts for relabeling, and the model is updated accordingly.

Addressing Class Imbalance

The challenge of class imbalance comes into play when certain classes have far fewer samples. For example, in a dataset containing multiple skin conditions, one condition might present a significantly lower number of images than others. To ensure that the model learns effectively, strategies should focus on balancing the representation of classes.

Variance Of Gradients

One novel technique introduced in this approach is the Variance of Gradients (VOG). While traditional methods may rely on a sample's loss to determine its status as clean or noisy, VOG helps analyze the change in gradients over time. This helps in identifying underrepresented samples more accurately and ensures that minority classes are recognized during the training process.

Datasets Used

The effectiveness of the proposed method is shown using two specific datasets: ISIC-2019 and NCT-CRC-HE-100K. The ISIC-2019 dataset has images of skin diseases, while the NCT-CRC-HE-100K dataset contains histopathology images. Both datasets display significant Class Imbalances, providing a proper ground to test how well the method performs in real-world settings.

ISIC-2019 Dataset

This dataset comprises over 25,000 images of various skin diseases, which are divided into training, validation, and test sets. The distribution among the classes is uneven, leading to challenges when training classifiers. The goal remains to ensure that the model learns effectively across all represented conditions despite the imbalance.

NCT-CRC-HE-100K Dataset

The long-tailed NCT-CRC-HE-100K dataset is another critical data source, with numerous histopathology images. Similar to ISIC-2019, this dataset also suffers from class imbalance, allowing for a thorough evaluation of the proposed method and its ability to manage noisy labels effectively.

Experiments and Results

To validate the effectiveness of the proposed method, various experiments were conducted. The performance of the active label cleaning approach was compared against several baseline methods.

Active Learning Comparison

Different active learning strategies were tested, including random sampling and entropy-based sampling. The goal was to see how well these strategies could select samples for relabeling and improve the model's performance. Results showed that starting with a model trained on noisy data was generally less effective than training with clean samples initially identified through the proposed method.

Conclusion

The proposed two-phase approach combining learning with noisy labels and active label cleaning demonstrates significant improvements in medical image classification tasks, especially in handling noisy labels and class imbalance. By effectively relabeling important samples and using innovative techniques like Variance of Gradients, the method presents a practical way to enhance the robustness of classifiers in the face of label noise.

In summary, the key takeaways include the importance of clean labels, the effectiveness of active learning in cleaning noisy labels, and the benefits of addressing class imbalance. By focusing on these areas, medical image classification can become more accurate, ultimately aiding in better diagnosis and treatment of various health conditions.

Improving Medical Image Classification with Active Label Cleaning

A new method enhances classification despite noisy labels and imbalanced datasets.

The Problem of Noisy Labels

Importance of Clean Labels

Active Label Cleaning Approach

Phase 1: Learning with Noisy Labels

Phase 2: Active Label Cleaning

Addressing Class Imbalance

Variance Of Gradients

Datasets Used

ISIC-2019 Dataset

NCT-CRC-HE-100K Dataset

Experiments and Results

Active Learning Comparison

Conclusion

Reference Links

Referenced Topics

Improving Medical Image Classification with Active Label Cleaning

A new method enhances classification despite noisy labels and imbalanced datasets.

#The Problem of Noisy Labels

#Importance of Clean Labels

#Active Label Cleaning Approach

#Phase 1: Learning with Noisy Labels

#Phase 2: Active Label Cleaning

#Addressing Class Imbalance

#Variance Of Gradients

#Datasets Used

#ISIC-2019 Dataset

#NCT-CRC-HE-100K Dataset

#Experiments and Results

#Active Learning Comparison

#Conclusion

Reference Links

Referenced Topics

The Problem of Noisy Labels

Importance of Clean Labels

Active Label Cleaning Approach

Phase 1: Learning with Noisy Labels

Phase 2: Active Label Cleaning

Addressing Class Imbalance

Variance Of Gradients

Datasets Used

ISIC-2019 Dataset

NCT-CRC-HE-100K Dataset

Experiments and Results

Active Learning Comparison

Conclusion