Addressing Gender Bias in Machine Translation
New resource aims to tackle gender ambiguity in translation systems.
― 7 min read
Table of Contents
- What is GATE?
- The Importance of Gender in Translation
- The Challenge of Arbitrary Gender Marking
- Building the GATE Corpus
- Challenges in Translation
- Evaluating Translation Systems
- Linguistic Considerations
- Coreference and Gender Agreement
- The Role of Gender Generics
- Related Work and Future Directions
- Conclusion
- Original Source
- Reference Links
In recent times, there has been a lot of improvement in how machines translate sentences that clearly indicate gender, like "he is a teacher" or "she is a doctor." However, translating sentences where the gender is not clear is still a big challenge. When the original sentence does not specify gender, machine translation systems often default to traditional gender roles, which can create bias. For instance, a sentence that could refer to either a man or a woman may be automatically translated using male terms, which can reinforce stereotypes about gender.
To tackle this problem, some new systems called "gender rewriters" have been created. These systems aim to take a sentence that is translated one way and provide alternative translations that reflect different gender interpretations. However, these systems often struggle with the variety of languages and can miss some important language details. To help improve these systems, we have created a new resource called Gate, which includes examples of sentences that can be translated in multiple ways, depending on gender.
What is GATE?
GATE stands for "Gender-Ambiguous Translation Examples." It consists of a collection of sentences that have an unclear gender and shows how these sentences can be translated into three different Romance languages: Spanish, French, and Italian. Each original English sentence has been paired with several translations that reflect different gender assignments for the same concept. This diverse set of examples will help researchers develop better gender-rewriting systems and improve overall translation quality.
The Importance of Gender in Translation
Languages express gender in different ways. For example, in English, the word "nurse" can refer to either a man or a woman. However, in Spanish, there are two different words for nurse: "enfermera" for females and "enfermero" for males. This difference in expression leads to translation challenges. When a machine translation model encounters an ambiguous gender, it often chooses one gender arbitrarily, which can perpetuate harmful stereotypes.
To highlight this issue, we have coined the term "arbitrary gender marking" for situations where a translation assigns gender without clear indication from the source text. We refer to the entities in these cases as Arbitrarily Gender-Marked Entities (AGMEs).
The Challenge of Arbitrary Gender Marking
Arbitrary gender marking is a significant problem because it can reinforce societal biases. For instance, if a machine translation model translates "the surgeon" only as "el cirujano" (male), it suggests that surgeons are predominantly male, which is not necessarily true. There has been progress in creating systems that attempt to rewrite these translations to cover both gender options, but the current models often struggle to do so effectively.
The goal of GATE is to provide a more reliable source of examples that include gender ambiguity, allowing for better evaluation and improvement in translation rewriters. Each English sentence in our collection is paired with various translations that reflect all possible gender assignments.
Building the GATE Corpus
The GATE corpus has been carefully constructed with the help of bilingual linguists who are knowledgeable in the languages involved. Our aim was to collect about 2,000 examples for each target language, ensuring that these examples reflect a wide variety of sentence structures, lengths, and vocabulary.
Each example consists of an English sentence containing at least one AGME, and translations into the target language that correspond to all possible male and female interpretations. For instance, the sentence, "I know a Turk who lives in Paris," can be translated into Spanish as both "Conosco una turca que vive a Parigi" (female) and "Conosco un turco que vive a Parigi" (male).
Challenges in Translation
When translating sentences, it is essential to consider how gender is marked. In some cases, a single English sentence may have multiple gendered translations in another language. Our corpus reflects this complexity by presenting a range of examples that showcase the diversity of gender expression across languages.
Each example is annotated with linguistic properties, such as which nouns may refer to gender and their grammatical roles in the sentences. This detailed annotation helps ensure the sentences in GATE can serve as effective test cases for gender rewriters.
Evaluating Translation Systems
One of the key aspects of developing better translation systems is evaluating how well they perform. By using GATE, we can assess how accurately a translation system generates multiple translations covering different gender assignments. When performing this evaluation, we focus on matching the output translation to the correct gendered alternative from our dataset.
Our evaluation method checks whether the translation system has accurately transformed the gender of the AGME in the translated sentence. We consider a translation successful if it correctly matches the desired gender assignment. We also account for cases where the translation system may not produce any gendered output at all, which can happen when there are no AGMEs in the sentence.
Linguistic Considerations
When working with gender in languages, it's essential to understand how different languages express it. In Romance languages, such as Spanish, French, and Italian, nouns have a grammatical gender that is either masculine or feminine. This gender is often tied to the meaning of the noun, especially when referring to animate entities like people. However, not every noun clearly marks gender, leading to situations where a noun that refers to a person may not indicate gender clearly.
For instance, in Spanish, the word for "person" is always feminine ("la persona"). Yet, in contrast, a word like "doctor" can be translated as "doctora" (female) or "doctor" (male), demonstrating that context plays an important role in determining gender in translation.
Coreference and Gender Agreement
Coreference is another crucial aspect of translation related to gender. This occurs when different parts of a sentence refer to the same entity. For example, in the sentence "My best friend is a nurse," the word "friend" may refer to a person whose gender is unknown, while "nurse" may indicate gender depending on the translation.
When translating, it is important to maintain consistency in gender assignment across coreferent mentions. If "friend" is referred to as a female nurse in the translation, then it is expected that "friend" should also be interpreted as female throughout the sentence. This focus on coreference helps ensure an accurate and coherent translation.
The Role of Gender Generics
In many languages, there is a practice called "masculine generics," where masculine terms are used as defaults when referring to mixed-gender groups or when the gender is unknown. For example, a phrase like "the doctor" may be understood to include both male and female doctors. However, this practice can lead to a bias towards male representation, which can be problematic.
To address this, our linguists were instructed to provide alternatives using feminine terms when appropriate, ensuring that multiple gender options are available in the translation. Our work aims to promote inclusive language practices and provide equal representation for all genders in translated texts.
Related Work and Future Directions
There has been considerable research on gender bias in machine translation. Various challenge sets and datasets have been created to evaluate how well translation systems handle gender issues. These efforts are important for understanding the extent to which translation systems reflect societal biases and stereotypes.
Moving forward, we plan to expand the GATE corpus to include additional languages and explore other gender-related phenomena in translation. One goal is to include examples that illustrate ambiguous gender situations while also providing clear and unambiguous gender references.
Furthermore, we aim to investigate the use of gender-neutral language constructs to better accommodate non-binary identities and promote inclusive language practices in translation systems.
Conclusion
The GATE corpus is a significant step in addressing the challenges faced when translating gender-ambiguous sentences. By providing a diverse set of examples that reflect various gender interpretations, we are paving the way for improved translation systems that reduce bias and enhance the quality of machine-generated translations.
As language evolves, our understanding and approach to gender representation must also grow. Through continued research and development, we hope to create more inclusive and accurate translation tools that better represent the diversity of human identity and experience.
Title: GATE: A Challenge Set for Gender-Ambiguous Translation Examples
Abstract: Although recent years have brought significant progress in improving translation of unambiguously gendered sentences, translation of ambiguously gendered input remains relatively unexplored. When source gender is ambiguous, machine translation models typically default to stereotypical gender roles, perpetuating harmful bias. Recent work has led to the development of "gender rewriters" that generate alternative gender translations on such ambiguous inputs, but such systems are plagued by poor linguistic coverage. To encourage better performance on this task we present and release GATE, a linguistically diverse corpus of gender-ambiguous source sentences along with multiple alternative target language translations. We also provide tools for evaluation and system analysis when using GATE and use them to evaluate our translation rewriter system.
Authors: Spencer Rarrick, Ranjita Naik, Varun Mathur, Sundar Poudel, Vishal Chowdhary
Last Update: 2023-03-07 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2303.03975
Source PDF: https://arxiv.org/pdf/2303.03975
Licence: https://creativecommons.org/licenses/by-nc-sa/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.