Simple Science

Cutting edge science explained simply

What does "Counterfactually Augmented Data" mean?

Table of Contents

Counterfactually Augmented Data (CAD) is a method used to create new examples for training machine learning models. It works by making small changes to existing data, which can change the label or category of that data. For example, if there is an image of a cat labeled as "cat," a slight edit might change it to look more like a dog, and now the label could become "dog."

Purpose of CAD

The goal of CAD is to help models learn better by showing them examples that are similar but different. This helps the models avoid mistakes based on unimportant features that might wrongly link to a label. By spreading out the relationships between different classes, models can become more robust and accurate.

Challenges with CAD

While CAD can improve learning, it also has some downsides. Sometimes, models may focus too much on the changes made to the data and ignore other important details. This can lead to problems when models are faced with new, unseen data that they were not trained on.

Improving with Contrastive Learning

To address these challenges, researchers use a technique called contrastive learning. This method encourages models to look at a wider range of features, not just those that were updated. It helps balance the focus, allowing models to perform better when dealing with new or different types of data.

Latest Articles for Counterfactually Augmented Data