# Computer Science # Computation and Language

M-ALERT: Ensuring Multilingual Safety in Language Models

M-ALERT tests language models for safety across five languages.

Felix Friedrich, Simone Tedeschi, Patrick Schramowski, Manuel Brack, Roberto Navigli, Huu Nguyen, Bo Li, Kristian Kersting

2025-02-12T21:58:30+00:00 ― 5 min read

Table of Contents

What is M-ALERT?
Why Multilingual Safety Matters
The Need for Comprehensive Evaluation
Safety Categories in M-ALERT
How Does M-ALERT Work?
Challenges in Translation
Testing Language Models
Results of the Testing
Highlights of Inconsistent Safety
Understanding Policy Implications
The Role of Model Size
Future Directions
Conclusion
Humorous Takeaways
Original Source
Reference Links

Language Models are programs designed to understand and generate human language. They've become incredibly popular and useful in various applications, from chatbots to content creation. However, as they become more widespread, there are growing concerns about their Safety, especially across different Languages. This is like having a tool that can build a beautiful house but might accidentally throw in a few explosive bricks.

What is M-ALERT?

M-ALERT is a new system that evaluates the safety of language models in five different languages: English, French, German, Italian, and Spanish. Think of it as a safety test for these models, making sure they don't say anything harmful or biased. M-ALERT contains about 75,000 prompts, or questions, that the models will respond to. These prompts are sorted into categories to help identify specific safety issues.

Why Multilingual Safety Matters

Language models are used by people all over the world. If they are unsafe or biased in one language, it can cause problems for users of that language. Just imagine a language model giving harmful advice in Italian while offering safe and useful information in English. That could lead to misunderstandings and even danger in some situations. Ensuring that language models are safe across all languages is crucial for effective communication and trust.

The Need for Comprehensive Evaluation

Previous efforts to assess language model safety largely focused on English. While that's a start, it misses the mark for a multilingual world. Just because a language model is safe in English doesn't mean it is in French or Spanish. M-ALERT fills this gap by providing a detailed framework to evaluate safety across multiple languages.

Safety Categories in M-ALERT

M-ALERT uses a specific structure to categorize safety risks. It has 6 major categories and 32 smaller ones. This detailed breakdown allows for a more in-depth analysis of where models may fail in terms of safety. For example, if a model is safe in one context, it might still be unsafe in another.

How Does M-ALERT Work?

When a language model is tested using M-ALERT, it is given prompts that are linked to a specific risk category. After generating a response, that response is assessed by a bilingual judge to determine its safety. This process helps create a general safety score as well as specific scores for each category and language.

Challenges in Translation

One of the biggest challenges in building M-ALERT was ensuring that the translations of prompts were accurate. Translation is tricky, and what sounds right in one language might not in another. M-ALERT employs a sophisticated translation system that includes multiple models and checks to ensure high-quality output. This process is crucial for ensuring that all users receive accurate and relevant information, regardless of their language.

Testing Language Models

Ten different language models were tested using M-ALERT. The aim was to identify strengths and weaknesses in their safety performance. Some models were generally safe, but they exhibited inconsistencies across languages. For instance, a model might be safe in German but raise safety flags in Italian. Others showed consistently unsafe behavior in specific categories.

Results of the Testing

The testing revealed noticeable safety discrepancies across languages. While some models like Gemma-2 performed well in multiple languages, others, such as aya-23 and c4ai-command, struggled significantly. When evaluated, nearly all models showed at least some unsafe outputs in one or more languages.

Highlights of Inconsistent Safety

One surprising finding was that safety wasn't always consistent across languages. For example, a model might perform safely in English but not in Italian for the same prompt. This inconsistency raises questions about how language models are trained and evaluated. It appears that the models might need better data or methods to handle language-specific nuances.

Understanding Policy Implications

Safety isn’t just about being free from harmful content; it also involves understanding different cultural contexts. For instance, something that is considered safe in one country might be viewed differently in another because of local laws and cultural norms. M-ALERT helps identify these differences, allowing models to be fine-tuned for specific regions or groups.

The Role of Model Size

Another interesting aspect of the research was the impact of model size on safety. Surprisingly, smaller models were sometimes found to be safer than larger ones. This suggests that just adding more parameters to a model doesn’t necessarily improve safety. It’s more about how these models are trained and the quality of the data they use.

Future Directions

While M-ALERT has made significant contributions to understanding language model safety, there's still much work to be done. Future studies could focus on refining translation methods and expanding the tool to more languages. Enhancements in evaluation systems would also be beneficial to ensure high-quality results across all aspects.

Conclusion

In summary, M-ALERT represents a significant step forward in evaluating the safety of language models in various languages. By identifying inconsistencies and highlighting particular risks, it encourages further research into safer and more reliable models. After all, when it comes to language models, it’s essential to ensure they are not just smart but also safe for everyone, no matter what language they speak. The future of language models should be bright and inclusive, ensuring that all users can benefit from technology without fear.

Humorous Takeaways

So, if you think of language models as your chatty, slightly unpredictable friends, M-ALERT is like the safety helmet you wear when you hang out with them. It can help prevent any embarrassing or dangerous situations that could arise! Just remember, not all friends are created equal, and some may need more guidance than others.

In the end, whether you’re chatting in English, French, German, Italian, or Spanish, everyone deserves a safe conversation, just like how everyone deserves a cake that doesn't collapse halfway through the party!

Original Source

Title: LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

Abstract: Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.

Authors: Felix Friedrich, Simone Tedeschi, Patrick Schramowski, Manuel Brack, Roberto Navigli, Huu Nguyen, Bo Li, Kristian Kersting

Last Update: 2024-12-19 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2412.15035

Source PDF: https://arxiv.org/pdf/2412.15035

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

Reference Links

Referenced Topics

More from authors

Machine Learning A New Approach to Causal Inference with Mixed Data

Characteristic Interventional Sum-Product Networks improve causal inference in mixed data.

Harsh Poonia, Moritz Willig, Zhongjie Yu

2025-06-28T00:18:30+00:00 ― 5 min read

Machine Learning Evaluating Large Language Models for Real-World Use

A new approach to assess LLMs with diverse evaluation sets.

Ravi Raju, Swayambhoo Jain, Bo Li

2025-06-26T22:53:48+00:00 ― 6 min read

Artificial Intelligence Integrating Knowledge Graphs for Enhanced Text Representations

A method that combines knowledge graphs with language models to improve document classification.

Boshko Koloski, Senja Pollak, Roberto Navigli

2025-06-25T14:14:36+00:00 ― 6 min read

Computation and Language Evaluating Machine Translation: New Insights and Challenges

A look at recent findings in machine translation evaluation methods.

Stefano Perrella, Lorenzo Proietti, Alessandro Scirè

2025-06-22T12:06:54+00:00 ― 5 min read

Computer Science and Game Theory Fair Allocation of Resources: A Graph Approach

This study examines envy-free resource distribution in graph settings.

Yu Zhou, Tianze Wei, Minming Li

2025-06-16T14:50:12+00:00 ― 5 min read

Computation and Language AI and Diagnostic Reasoning in Text Analysis

This article explores the NL-DAR framework for improving diagnostic reasoning with AI.

Nils Dycke, Matej Zečević, Ilia Kuznetsov

2025-06-14T18:27:54+00:00 ― 6 min read

Applications The Lasting Climate Impact of Mount Pinatubo

Examining how a volcanic eruption shaped climate patterns globally.

Samantha Shi-Jun, Lyndsay Shand, Bo Li

2025-06-05T11:19:32+00:00 ― 8 min read

Statistical Finance Understanding Company Fundamentals and Forecasting

Learn how company fundamentals and forecasting influence investment decisions.

Felix Divo, Eric Endress, Kevin Endler

2025-05-26T20:52:18+00:00 ― 6 min read

M-ALERT: Ensuring Multilingual Safety in Language Models

#What is M-ALERT?

#Why Multilingual Safety Matters

#The Need for Comprehensive Evaluation

#Safety Categories in M-ALERT

#How Does M-ALERT Work?

#Challenges in Translation

#Testing Language Models

#Results of the Testing

#Highlights of Inconsistent Safety

#Understanding Policy Implications

#The Role of Model Size

#Future Directions

#Conclusion

#Humorous Takeaways