Addressing Gender Bias in Neural Translation

Table of Contents

The Gender Bias in Translation
GATE X-E Dataset
Gender Rewriting Process
Evaluation of the Rewriting System
Human Evaluation of Accuracy
Conclusion
Original Source
Reference Links

Neural Machine Translation (NMT) has come a long way, improving in quality and use. However, there’s still a big problem with Gender Bias in Translations, especially when translating from languages that don’t mark gender strongly into English, which does. Despite many studies on this issue, there haven’t been any proper ways to measure or fix these biases.

To fill this gap, we introduce GATE X-E, a new dataset created from human translations of Turkish, Hungarian, Finnish, and Persian into English. This dataset includes translations that show different gender options: feminine, masculine, and neutral. With 1250 to 1850 examples for each language pair, GATE X-E offers a variety of sentences that challenge translators with different language features. We’ve also built a translation tool using GPT-4 and used GATE X-E to test its effectiveness. Our aim is to share this project to help others research gender bias in translations.

The Gender Bias in Translation

Even though NMT has improved greatly, gender bias still appears in its results. One major issue is that the translated text sometimes gives a gender to individuals when the original text does not. This is common when translating from languages that do not use gender in the same way as English.

In English, gender can appear through pronouns like "he" or "she," or through specific nouns such as "mother" or "uncle." In the languages we focus on, such as Turkish, all personal pronouns are neutral. For example, the word "O" in Turkish can refer to "he," "she," or "they." This leads to situations where the NMT model may assign a gender based on stereotypes rather than the original text.

Examples of Gender Bias

When translating from Turkish to English, for instance, the machine often uses the pronoun "she" for persons whose gender is not specified. This happens because there is a common belief that women are more involved in caregiving. To fix this, we propose providing translations in all three gender forms to cover all possible interpretations.

The problem arises when translating sentences that do not list a gender but end up being marked as female. This bias is often based on stereotypes that associate certain roles with women, such as child care. One way to address this is to create multiple versions of a translation, including masculine and gender-neutral options.

GATE X-E Dataset

GATE X-E is an addition to the original GATE corpus, which already evaluated gender rewrites for translations from English to several languages. GATE X-E focuses on translations from Turkish, Hungarian, Finnish, and Persian into English. The dataset consists of natural sentences that vary in length and context, testing different aspects of translation.

Data Collection Process

To create GATE X-E, sentence pairs were gathered from various sources. The selected sentences were filtered to ensure they met certain criteria based on language detection and the presence of gendered terms in English translations. We then provided these sentences to native speakers, who annotated them for gender types and possible translations.

Each instance in GATE X-E includes a source sentence and its translations, covering various gender interpretations. For each translation, we classify whether it includes gender-marked terms and how many Arbitrarily Gender-Marked Entities (AGMEs) are present. AGMEs are portions of a sentence where the gender is not specified in the original but is in the translation.

Characteristics of the Dataset

The dataset offers a strong variety of cases. Many instances contain one AGME, while some may have none or more than one. Most of these cases involve gendered pronouns that do not appear in the source. The dataset also includes mixed cases, where some individuals are gender-neutral and others are gender-marked.

The challenge lies in ensuring that translations remain consistent and accurate while exploring different gender interpretations. The annotators worked through the sentences, marking errors and providing alternative translations as needed.

Gender Rewriting Process

The process of gender rewriting involves taking a sentence translated with a specific gender and creating new versions with different gender assignments. The goal is to offer translations that reflect all potential gender interpretations without altering the main message of the text.

Types of Problems in Gender Rewriting

There are two main categories of rewriting problems to consider:

Pronoun-Only Problems: In these cases, if the only gender markers in the translation are pronouns, then the source sentence will not contain any gendered pronouns. All individuals mentioned in the translation are AGMEs. The rewrite focuses on changing these pronouns to fit the desired gender while keeping the original meaning intact.
Gendered-Noun Problems: Here, the source may contain nouns that explicitly indicate gender. This type of rewriting is more complex as it requires differentiating between AGMEs and those that are gender-marked in both the source and translation. The system must handle cases with both gendered nouns and neutral terms.

The Role of GPT-4

To assist with rewriting tasks, we developed a solution using GPT-4. This tool generates three versions of the translation: gender-neutral, all-female, and all-male. The system was designed to guide GPT-4 step-by-step through identifying AGMEs and rewriting the original translation accordingly.

GPT-4 uses detailed prompts to clarify what is needed. The outputs are then compared to the original translations to check for accuracy.

Evaluation of the Rewriting System

We assessed the effectiveness of our GPT-4-based rewriting solution using the GATE X-E dataset. The evaluation process involved measuring how accurately the modified sentences matched with expected versions.

Accuracy Results

Results showed that the system performed exceptionally well on cases where only pronouns were involved, achieving high accuracy rates. However, accuracy dropped for tasks that included gendered nouns. This discrepancy is expected due to the added complexity of cases with marked nouns versus those purely relying on pronouns.

Challenges in Gendered-Noun Rewriting

The main difficulties arise when translating sentences that include gendered nouns. In these cases, the meaning often changes depending on which gendered term is used, making it critical to ensure that only the appropriate terms are modified. Misidentifying which terms can be changed leads to errors in the output.

Human Evaluation of Accuracy

To further ensure the quality of the translations, we sought human evaluations. Evaluators looked at outputs where the rewriting system made errors, noting whether nouns or pronouns were incorrectly changed or missed.

Common Error Types

Errors were generally found in two categories: extraneous changes, where unnecessary modifications were made, and missing changes, where the system failed to alter gendered terms that should have been modified.

Conclusion

With GATE X-E, we provide a valuable resource for studying gender bias in machine translations. The dataset helps to expose the complexities involved in translating gendered language and offers a means to develop and evaluate tools that mitigate these biases. Our hope is that it will inspire further research and lead to more equal representation in translations across different languages.

Future research might explore using open-source models for gender rewriting and assessing their effectiveness. We aim to keep the conversation going to create more inclusive and fair translations.

Addressing Gender Bias in Neural Translation

A new dataset aims to reduce gender bias in machine translations.

The Gender Bias in Translation

Examples of Gender Bias

GATE X-E Dataset

Data Collection Process

Characteristics of the Dataset

Gender Rewriting Process

Types of Problems in Gender Rewriting

The Role of GPT-4

Evaluation of the Rewriting System

Accuracy Results

Challenges in Gendered-Noun Rewriting

Human Evaluation of Accuracy

Common Error Types

Conclusion

Reference Links

Referenced Topics

Addressing Gender Bias in Neural Translation

A new dataset aims to reduce gender bias in machine translations.

#The Gender Bias in Translation

#Examples of Gender Bias

#GATE X-E Dataset

#Data Collection Process

#Characteristics of the Dataset

#Gender Rewriting Process

#Types of Problems in Gender Rewriting

#The Role of GPT-4

#Evaluation of the Rewriting System

#Accuracy Results

#Challenges in Gendered-Noun Rewriting

#Human Evaluation of Accuracy

#Common Error Types

#Conclusion

Reference Links

Referenced Topics

The Gender Bias in Translation

Examples of Gender Bias

GATE X-E Dataset

Data Collection Process

Characteristics of the Dataset

Gender Rewriting Process

Types of Problems in Gender Rewriting

The Role of GPT-4

Evaluation of the Rewriting System

Accuracy Results

Challenges in Gendered-Noun Rewriting

Human Evaluation of Accuracy

Common Error Types

Conclusion