What does "Clean Inputs" mean?
Table of Contents
Clean inputs refer to data that has not been altered or infected by any harmful elements, such as backdoor triggers. In the context of language models, clean inputs are the standard examples that the model is trained on. These inputs help the model learn properly and produce accurate responses.
Using clean inputs is important for ensuring that a language model behaves as expected. When clean data is fed into the model, it generates clear and logical explanations for its decisions. This is crucial for understanding how the model works and for maintaining trust in its outputs.
In contrast, when a model receives poisoned inputs, which contain hidden triggers, it may behave unpredictably. This can lead to poor-quality explanations and a lack of consistency in the model's responses. By focusing on clean inputs, developers can help ensure that language models are safe and reliable.