Simple Science

Cutting edge science explained simply

# Computer Science# Computation and Language

Introducing Snap: A New Way for LLMs to Forget

Snap helps large language models unlearn specific information while keeping their performance.

― 7 min read


Snap: Forgetting in AISnap: Forgetting in AIspecific knowledge.New framework for LLMs to unlearn
Table of Contents

Large language models (LLMs) like ChatGPT are widely used by many people in their everyday lives. While these models can be helpful, they sometimes reveal personal or copyrighted information. This raises a need for a method to "unlearn" this selective knowledge, meaning to remove it from the model without losing its overall ability.

Previous efforts to make models forget specific information often resulted in the models giving strange or incorrect responses when asked about that information. This can make using the models frustrating for users. To address this issue, we introduce a new framework called Snap that aims to effectively remove unwanted knowledge while keeping the model's performance intact.

What is Machine Unlearning?

Machine unlearning refers to the process of teaching a trained machine learning model to forget specific pieces of information. People are increasingly concerned about privacy issues, particularly in line with regulations like the Right to be Forgotten in Europe and similar laws in the United States. Companies need ways to erase personal information when requested.

Additionally, there is concern about copyrighted content generated by LLMs. Existing unlearning methods frequently attempt to disconnect certain data from related information, but this can lead to models outputting confusing answers. Our method aims to ensure that the model simply does not answer questions about the information we want it to forget.

The Challenge of Unlearning

Unlearning is complex. It involves changing a model that has millions or even billions of parameters. One way to ensure a model forgets information is to completely retrain it from scratch without the data to be removed. However, this is often too expensive and time-consuming, especially with large models.

As LLMs gain popularity, there is a growing interest in finding quicker ways to unlearn information. Research into machine unlearning has traditionally focused on image processing tasks, but the rise of LLMs brings similar concerns in natural language processing (NLP).

Our Approach: Snap

Our framework, Snap, is designed to help LLMs unlearn selective information while keeping their original abilities. The method involves several steps:

  1. Negative Instructions: We create a set of instructions that guide the model to produce responses indicating it has forgotten certain knowledge.
  2. Hard Retaining Data Augmentation: We generate additional instruction data related to the knowledge we want to keep, ensuring that the model distinguishes between what to forget and what to remember.
  3. Wasserstein Regularization: This technique helps ensure that changes made to the model during training do not overly affect its abilities.

Through these steps, we can effectively remove information like names while still allowing the model to answer other questions accurately.

Creating Negative Instructions

To begin with, we develop a set of negative instructions that tell the model what to forget. We automate this process by using LLMs to generate questions that would relate to the information we want to erase. Each question is then paired with a response stating that the model cannot answer.

We filter these questions to ensure diversity, removing duplicates and only selecting unique variations. This results in a set of high-quality instructions that guide the model to produce obliterated responses.

Hard Retaining Data Augmentation

Simply telling the model to forget something can lead to it forgetting related information that should be retained. To prevent this, we add a layer of hard retaining data augmentation. Here, we construct a second set of instructions that asks questions related to the information we want to keep.

By training the model on both negative instructions and hard retaining data, we help it learn the distinction between what needs to be forgotten versus what should be remembered. This dual approach ensures a more balanced outcome during the unlearning process.

Implementing Wasserstein Regularization

Wasserstein regularization acts as a safeguard. It controls how much the model’s parameters change during training, ensuring the model maintains its overall performance. This technique measures the cost of changing the model's parameters and seeks to minimize unnecessary alterations.

Using this approach helps us manage how much we modify the model, allowing it to retain its capabilities while also achieving the desired unlearning.

Evaluating the Framework

To demonstrate the effectiveness of our approach, we perform evaluations using diverse sets of instructions. One example is trying to erase knowledge about a famous character, Peter Parker, while ensuring that the model can still relate to other topics.

We assess the model's performance in various ways:

  • Unlearning Accuracy (UA): How effectively the model generates obliterated responses about the forgotten information.
  • Retaining Accuracy (RA): How well the model responds accurately to questions related to the information we want to keep.
  • Testing Accuracy (TA): The model's performance on completely unrelated topics, ensuring it still performs well in general.

Through these evaluations, we assess the model's ability to forget specific knowledge without sacrificing its overall usefulness.

Results and Findings

Our results show that the Snap framework is effective. When we test the model's responses after the unlearning operation, it successfully avoids answering questions about Peter Parker, while still being capable of responding accurately to other types of questions.

In general, the model retains around 95% of its original performance across various tasks, confirming that it can effectively unlearn specific information without diminishing its abilities in other areas.

Addressing Real Personal Data

We also tested the Snap framework with real personal data. For instance, we could examine a well-known individual to see if the model can effectively forget their information. In this case, we used Bill Gates as a reference.

We performed tests to confirm that the model still functions well when asked about related topics, such as people or organizations tied to Bill Gates, while remaining incapable of discussing information directly about him. These results suggest that Snap can be applied to real-world scenarios of privacy concerns.

Multiple Unlearning Requests

Our exploration also included how well the model manages multiple unlearning requests. We tested both batch unlearning (removing several identities at once) and sequential unlearning (removing identities one at a time). The results indicated that the model could handle both scenarios efficiently while maintaining its capabilities.

Notably, as we unlearn more identities, the model exhibits improved performance on related tasks, reinforcing our approach’s adaptability. This improvement occurs because the model can leverage similar retaining data when tackling new unlearning requests.

Future Directions

While Snap shows promise for selective unlearning, there remains space for improvement. One limitation is that the framework doesn’t eliminate knowledge entirely; instead, it teaches the model to avoid providing specific information.

Research could focus on refining this process so that the knowledge is more thoroughly removed from the model's parameters. This would address concerns about how effectively an LLM can adhere to privacy regulations while still performing its functions.

Another avenue for future exploration could involve making the framework more generalized to other languages beyond English. As it stands, Snap has been developed primarily for English instruction sets, and there may be opportunities to broaden its reach.

Conclusion

In summary, Snap presents a new approach to unlearning selective knowledge in large language models. By using negative instructions, hard retaining data, and regularization methods, we offer a means of effectively removing unwanted information while retaining the model's overall capabilities.

This framework holds valuable implications for real-world applications where privacy and copyright are essential considerations. As LLMs continue to be integrated into various services, having effective methods for unlearning will be crucial in safeguarding user information.

Human Evaluation of the Framework

To validate the effectiveness of our instruction sets, we conducted human evaluations. We assessed the relevance, diversity, and accuracy of the generated instructions. Evaluators reviewed a variety of instances, ensuring that the questions were appropriate for the entities we aimed to unlearn.

Our findings indicate a high level of relevance and diversity within the instruction sets, supporting the effectiveness of using automated methods for generating both negative and retaining instructions.

Appendix: Dataset Examples

We include examples of how we built our negative and retaining instruction sets. Each question is paired with a response that aligns with our objectives of erasing certain knowledge while maintaining clarity on related topics.

In each dataset, we aimed for a balance between factual questions and more expansive open-ended questions, ensuring that the LLM can perform well across various types of inquiries.

This structured approach helps us create a robust dataset for unlearning select knowledge, making it easier for the model to adapt and perform effectively in practical use cases.

Original Source

Title: Opt-Out: Investigating Entity-Level Unlearning for Large Language Models via Optimal Transport

Abstract: Instruction-following large language models (LLMs), such as ChatGPT, have become widely popular among everyday users. However, these models inadvertently disclose private, sensitive information to their users, underscoring the need for machine unlearning techniques to remove selective information from the models. While prior work has focused on forgetting small, random subsets of training data at the instance-level, we argue that real-world scenarios often require the removal of an entire user data, which may require a more careful maneuver. In this study, we explore entity-level unlearning, which aims to erase all knowledge related to a target entity while preserving the remaining model capabilities. To address this, we introduce Opt-Out, an optimal transport-based unlearning method that utilizes the Wasserstein distance from the model's initial parameters to achieve more effective and fine-grained unlearning. We also present the first Entity-Level Unlearning Dataset (ELUDe) designed to evaluate entity-level unlearning. Our empirical results demonstrate that Opt-Out surpasses existing methods, establishing a new standard for secure and adaptable LLMs that can accommodate user data removal requests without the need for full retraining.

Authors: Minseok Choi, Daniel Rim, Dohyun Lee, Jaegul Choo

Last Update: 2024-12-04 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2406.12329

Source PDF: https://arxiv.org/pdf/2406.12329

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

More from authors

Similar Articles