Simple Science

Cutting edge science explained simply

# Computer Science# Computer Vision and Pattern Recognition

Advancing Defect Detection with Visual Prompting

A new method to enhance industrial defect detection accuracy.

Geonuk Kim

― 6 min read


Improving DefectImproving DefectDetection Accuracyindustrial defects.New method reduces errors in predicting
Table of Contents

In the world of industrial defect detection, most systems rely on supervised learning. This means they are trained to recognize specific types of defects using a labeled set of images. These models work well when they know what to expect, but they struggle when they encounter new or different kinds of defects. This leads to the need for constant updates and retraining, which can be time-consuming and expensive.

Recent developments in machine learning have introduced a method called Visual Prompting. This technique allows models to understand and classify defects based on visual clues instead of being strictly tied to pre-defined categories. By using images as prompts during the decision-making process, models can adapt to new defects more flexibly.

The Challenge of Overconfidence

One major challenge with visual prompting is that models often become overconfident in their predictions. This means they might incorrectly label unknown objects as known defects with high certainty. This overconfidence can lead to mistakes and misclassifications, which is a serious issue in industrial environments where accuracy is crucial.

To solve this problem, it is important to assess how confident a model really is in its predictions. Doing this allows us to identify situations where the model might be making errors or where it is less reliable.

Our Proposed Solution

To tackle the overconfidence issue, we propose a method that estimates the uncertainty in the visual prompting process. The key idea is to check if the model can correctly restore the original prompts from its predictions. Essentially, if the model is confident and accurate in its decisions, it should be able to go back and recreate the initial prompts correctly.

We measure how well the model does this using a metric called the Mean Intersection Over Union (mIoU). This metric helps us to compare the predicted results with the original prompts to see how closely they match.

By focusing on this cycle of checking and restoring prompts, we can effectively gauge the reliability of the model's predictions. This confidence estimation can help to reduce errors and improve the model's performance, especially in industrial settings where new defects often arise.

The Role of Baseline Methods

To evaluate our approach, we used a baseline method known as Dinov, which is based on an encoder-decoder structure. This method helps to process images and make predictions. The baseline involves encoding the visual prompts from reference images and then using a shared decoder to interpret these prompts in the context of new images.

However, one limitation of Dinov is that it can become biased toward defects that it has seen before. This can impair its ability to deal with new defects effectively. By employing our proposed Cycle-consistency method, we can help the model to be more reliable, reducing bias and improving its adaptability in real-world scenarios.

How Our Method Works

Our method consists of two main phases: the forward phase and the reverse phase.

Forward Phase

In the forward phase, we start with a support image and its corresponding prompt mask. We also have a query image that we want to analyze. The goal here is to identify which parts of the query image match the prompt from the support image. This process results in a mask map, which indicates the detected regions in the query image.

Reverse Phase

In the reverse phase, we take the output from the forward phase-specifically, the query image and its generated mask-and treat them as the new support image and mask. The original support image becomes the new query image. This step allows us to check if we can regenerate the original mask accurately.

By comparing the original mask with the mask generated in the reverse phase, we can gauge the model's reliability. If the restored mask closely matches the original, it indicates that the model is making unbiased predictions.

Image Processing Techniques

To improve our model's prediction accuracy, we utilize a powerful image feature extractor called Swin-L. This architecture has pre-trained weights from large datasets, allowing it to effectively analyze images.

Moreover, we apply various data augmentation techniques. These methods are crucial in industrial inspection contexts since they help in accounting for variations in lighting while keeping color changes minimal. We adjust the brightness, contrast, and saturation of images and perform horizontal flips during training to enhance the model’s robustness.

Single Model Approach

Many competitors in the field rely on using multiple models to boost performance. However, due to resource limits, we chose to focus on refining a single visual prompting model. Our strategy emphasizes estimating Confidence Scores to determine how trustworthy the predictions are, rather than building multiple models.

Evaluation of the Method

To validate our approach, we tested it on the VISION24 one-shot industrial inspection dataset, which consists of thousands of images. This dataset includes various categories of products, each with known and unknown defect types. Our evaluation considered two critical aspects: the positive pair catch rate and the negative pair yield rate.

A positive pair is deemed a success if the predicted mask matches the ground truth well. For negative pairs, we consider it a correct yield if the model's response rate is below a certain threshold.

Implementation Insights

Our training set encompasses five categories, including Cable, Cylinder, and PCB, each with different defects. For instance, the Cable category contains defects such as thunderbolt and torn-apart. Although they belong to the same primary category, we treated each defect type as an independent class, leading to a total of 12 classes.

Training involved resizing the images and using the same data augmentation techniques previously mentioned. The Dinov network was trained over 20,000 iterations on 8 GPUs with a specific batch size, using the AdamW optimizer.

When the confidence score from the model was above a certain value, we accepted the predicted mask; if it was lower, we marked it as unreliable.

Results and Findings

Our method achieved a commendable yield rate, indicating its effectiveness without needing specialized network designs or complex ensemble techniques. We witnessed a significant reduction in false positives due to our cycle-consistency approach.

Furthermore, qualitative evaluations revealed cases where our model accurately restored the support mask through both phases. In instances where predictions were accurate, our model demonstrated high mIoU scores, confirming its capability to successfully adapt to new defect types.

Conversely, in cases with lower mIoU scores, the model correctly identified weaknesses and avoided accepting biased predictions. This ability to discern between accurate and unreliable predictions is vital in industrial settings, where maintaining high accuracy is essential.

Conclusion

The method we propose represents a substantial step forward in industrial defect detection. By integrating visual prompting with cycle-consistency uncertainty estimation, our approach effectively reduces the risks associated with overconfidence. By ensuring that models can reliably restore original prompts, we enhance their adaptability to new types of defects while minimizing errors.

As we continue to refine our techniques and explore how to further improve performance, it is clear that this innovative approach can significantly benefit industries that regularly face new and unforeseen challenges. As technology improves, we can expect even more effective solutions for managing defects in various industrial domains.

Original Source

Title: Cycle-Consistency Uncertainty Estimation for Visual Prompting based One-Shot Defect Segmentation

Abstract: Industrial defect detection traditionally relies on supervised learning models trained on fixed datasets of known defect types. While effective within a closed set, these models struggle with new, unseen defects, necessitating frequent re-labeling and re-training. Recent advances in visual prompting offer a solution by allowing models to adaptively infer novel categories based on provided visual cues. However, a prevalent issue in these methods is the over-confdence problem, where models can mis-classify unknown objects as known objects with high certainty. To addresssing the fundamental concerns about the adaptability, we propose a solution to estimate uncertainty of the visual prompting process by cycle-consistency. We designed to check whether it can accurately restore the original prompt from its predictions. To quantify this, we measure the mean Intersection over Union (mIoU) between the restored prompt mask and the originally provided prompt mask. Without using complex designs or ensemble methods with multiple networks, our approach achieved a yield rate of 0.9175 in the VISION24 one-shot industrial challenge.

Authors: Geonuk Kim

Last Update: 2024-09-20 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2409.13984

Source PDF: https://arxiv.org/pdf/2409.13984

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

Similar Articles