Detecting Misogyny in Italian Social Media Language
A study on identifying misogynistic language through pejorative words in tweets.
― 8 min read
Table of Contents
Misogyny is often shown through hurtful language, making it important to find ways to detect it in social media. Many neutral words can carry negative meanings when used as insults. Understanding the meanings of these words is crucial for spotting misogynistic language. To help with this, we present a new collection of 1,200 Italian tweets that have been carefully labeled to show both hurtful language and misogynistic content.
Pejorative Language
Pejorative language refers to words or phrases that can belittle or insult someone. For example, some words can seem harmless but can take on a negative meaning based on how they are used. Certain terms can refer to both neutral ideas and negative traits. The way these terms shift in meaning depends on their context. This change in meaning is known as pejoration.
In contrast, melioration is when a term that starts off negative eventually takes on a neutral or positive meaning. For instance, some slurs can be reclaimed by the groups they were used against, changing their impact over time.
Pejorative terms are particularly relevant when looking for signs of misogyny since many harmless words are often used to insult women, focusing on their looks or intelligence. We call these harmful terms "pejorative epithets." Examples in Italian include "balena" which means both "whale" and is used to insult overweight women, and "gallina," which means "chicken" but can imply stupidity.
Modern models of language struggle to accurately identify misogynistic language when sentences include these tricky terms. When words that can mean multiple things are present in the training data but not in the testing data, it leads to many mistakes in classification.
To improve the detection of misogynistic language, we propose disambiguating pejorative terms first. Our aim is to find out if clarifying potentially hurtful terms can lead to better identification of misogynistic language while also reducing errors.
Research Questions and Methodology
To tackle our objectives, we focus on three main questions:
- What pejorative words are commonly used against women online?
- Can we improve models to identify whether words in tweets are used negatively or neutrally?
- How well do language models understand pejorative words in context?
To address the first question, we create a list of offensive terms used to target women. This helps us gather tweets that contain these words, which we then use to build our collection of 1,200 tweets.
For the second question, we fine-tune two models based on BERT, a popular language understanding model. The first model determines if a word in a tweet is used negatively or neutrally, while the second model detects misogyny. The results from the first model help inform the second one about the nature of the words used.
In response to the third question, we analyze how well larger language models understand these pejorative terms using their word patterns in context.
Corpus Compilation
To gather the pejorative words used against women in Italian online communities, we follow two main steps:
Creating a Lexicon: We gather a list of words from various sources, including input from native speakers who regularly use social media, and existing databases of offensive terms. The focus is on polysemous words-those with both neutral and negative meanings.
Retrieving Tweets: Using the compiled list, we collect tweets that include these pejorative terms. For our collection, we aim for a balanced mix of tweets using these words in both neutral and offensive ways.
To ensure the quality of our lexical choices, we manually verify that these words can be used in both ways by searching for them on Twitter. As a result, we end up with a final list of 24 words that have this dual usage.
Data Annotation
To label our dataset according to word meanings and misogyny detection, we enlist six annotators with expertise in various fields. Initially, we conduct a pilot study to explore the challenges in labeling and check for differences in perspectives between male and female annotators.
The annotation follows a flexible approach, allowing for personal judgments without strict guidelines. Each annotator examines 50 tweets. The consistency of the labels is later measured using a statistical analysis tool, showing a moderate agreement among the group.
Through feedback received from the annotators, we identify several key areas of debate:
Lack of Context: Some tweets are too brief, making the author's intent unclear. We label these as neutral when the meaning cannot be determined.
Objectifying Compliments: Some tweets might seem complimentary on the surface but are actually objectifying. We classify these as pejorative.
Usage Toward Objects: A term used negatively for an inanimate object does not automatically make it pejorative. We label these as neutral.
Pejorative Terms Against Men: Words used to insult men should be labeled as pejorative, even if they don't pertain to the main focus of the study.
Reported Speech: If a pejorative term is used in reported speech, it can still qualify as negative despite the overall context being neutral. We categorize these as pejorative.
After the pilot studies, we annotate our 1,200 tweet collection. To ensure accuracy, one annotator handles this task, ensuring consistency across the dataset. The final Correlation between misogyny and pejorative labels shows a significant link-many tweets carrying pejorative words are also identified as misogynistic.
Methodology for Detecting Misogyny
To evaluate the effectiveness of our approach, we utilize a popular BERT-based model called AlBERTo. We fine-tune it to perform pejorative word disambiguation and misogyny detection.
The disambiguation task involves identifying if a word in a sentence is pejorative or neutral. This classification helps enrich the input for the misogyny detection model. We explore two methods for doing this:
- Concatenation: Adding information about whether words are pejorative at the end of tweets.
- Substitution: Replacing ambiguous terms with their clear, unambiguous equivalents.
We run experiments on our dataset and benchmark datasets, looking for improvements in classification accuracy.
Results and Evaluation
The results from our experiments demonstrate that the disambiguation of pejorative words significantly enhances the detection of misogynistic language. Both methods we tried-concatenation and substitution-show clear improvements in model performance.
We also analyze the false-positive rates, looking at how many times the model incorrectly labels neutral tweets as misogynistic. After applying our pejorative word disambiguation, we observe a noticeable drop in false positives, especially in our test set.
While we see gains in our results, the impact on older benchmark datasets is more limited due to their lower number of pejorative examples. This suggests our approach works best when the training set includes a good mix of pejorative and neutral uses.
Qualitative Error Analysis
To further understand where our models struggle, we manually review misclassified tweets across different settings.
In cases where reported misogyny is present, models often have trouble recognizing the intent behind a pejorative term used in a condemning context. Additionally, when pejorative terms are directed at men, these instances are sometimes incorrectly classified as misogynistic.
Analysis of Word Embeddings
To analyze how well our model learns the meanings of pejorative words, we extract and study the word embeddings it uses. These embeddings help depict how closely related words are in meaning.
We look specifically at the average similarity between our pejorative terms and their neutral or negative anchor words. The findings show a clear distinction in how well the model captures the context after fine-tuning-indicating that it has indeed learned to understand the meaning behind these words better.
Analysis of Language Models
To further investigate understanding around pejorative terms, we prompt popular large language models to clarify the meanings of these words in context without any prior training.
Three open-source models are tested, and we find that while one model performs well in understanding subtle variations in meaning, others struggle significantly and often provide generic responses that do not clarify the terms effectively.
This reveals a gap in how well these models grasp nuanced meanings, suggesting that further development and training could yield better outcomes.
Conclusion
We have introduced a method for disambiguating pejorative words as a first step in detecting misogyny in tweets. By building a comprehensive collection of polysemic words and a novel dataset of tweets, we have shown that clarifying word meanings can improve detection efforts.
The experiments highlight our model's ability to reduce misclassification rates, and our analysis of word embeddings illustrates improved comprehension of nuanced meanings after fine-tuning.
Finally, we found that other language models have room for improvement when it comes to disambiguating pejorative terms. Future efforts may include expanding this work to more languages and cultures, which would allow for a broader perspective on how language shapes perceptions of gender.
Ethical Considerations
We have ensured to adhere to Twitter's guidelines for data use while collecting our dataset from publicly available tweets. Anonymity of individuals mentioned in our work is strictly maintained.
While our research focuses on the Italian language, the findings hint at the potential for extending this approach to more languages. This would provide further insight into the usage of pejorative terms and their implications in different cultural contexts.
Though our findings are valuable, we acknowledge the limitations of a single annotator's perspective and the challenges introduced by word substitutions that may not always carry the same meaning.
As we move forward, incorporating a wider range of models and addressing the previously mentioned limitations will strengthen our understanding of language in the context of misogyny detection.
Title: PejorativITy: Disambiguating Pejorative Epithets to Improve Misogyny Detection in Italian Tweets
Abstract: Misogyny is often expressed through figurative language. Some neutral words can assume a negative connotation when functioning as pejorative epithets. Disambiguating the meaning of such terms might help the detection of misogyny. In order to address such task, we present PejorativITy, a novel corpus of 1,200 manually annotated Italian tweets for pejorative language at the word level and misogyny at the sentence level. We evaluate the impact of injecting information about disambiguated words into a model targeting misogyny detection. In particular, we explore two different approaches for injection: concatenation of pejorative information and substitution of ambiguous words with univocal terms. Our experimental results, both on our corpus and on two popular benchmarks on Italian tweets, show that both approaches lead to a major classification improvement, indicating that word sense disambiguation is a promising preliminary step for misogyny detection. Furthermore, we investigate LLMs' understanding of pejorative epithets by means of contextual word embeddings analysis and prompting.
Authors: Arianna Muti, Federico Ruggeri, Cagri Toraman, Lorenzo Musetti, Samuel Algherini, Silvia Ronchi, Gianmarco Saretto, Caterina Zapparoli, Alberto Barrón-Cedeño
Last Update: 2024-04-03 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2404.02681
Source PDF: https://arxiv.org/pdf/2404.02681
Licence: https://creativecommons.org/licenses/by-sa/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.
Reference Links
- https://www.merriam-webster.com/dictionary/pejorative
- https://github.com/arimuti/PejorativITy
- https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/tree/master
- https://twarc-project.readthedocs.io
- https://github.com/teelinsan/camoscio
- https://huggingface.co/meta-llama/Llama-2-7b-hf
- https://huggingface.co/mistralai