LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps
Abstract
Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.
Community
We introduce M-ALERT, a multilingual benchmark with 75,000 safety prompts across five languages, to evaluate the safety of large language models (LLMs). Our study reveals significant inconsistencies in safety performance across languages and categories, with certain topics like crime propaganda and substance use consistently triggering unsafe responses. While some models excelled in specific languages or categories, inter-language consistency remained low, even for high-performing models. The findings highlight the need for language-specific safety tuning, policy-aware assessments, and improvements in translation pipelines to ensure robust multilingual safety practices. Our work aims at advancing AI safety research by providing a detailed evaluation framework and actionable insights for developing safer and more inclusive LLMs.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models (2024)
- The Roles of English in Evaluating Multilingual Language Models (2024)
- Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement (2024)
- P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs (2024)
- MILU: A Multi-task Indic Language Understanding Benchmark (2024)
- Multilingual Large Language Models: A Systematic Survey (2024)
- Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs (2024)
- MILU: A Multi-task Indic Language Understanding Benchmark (2024)
- Multilingual Large Language Models: A Systematic Survey (2024)
- Benchmarking LLM Guardrails in Handling Multilingual Toxicity (2024)
- Benchmarking Linguistic Diversity of Large Language Models (2024)
- Domain-Specific Translation with Open-Source Large Language Models: Resource-Oriented Analysis (2024)
- Multilingual LLMs Inherently Reward In-Language Time-Sensitive Semantic Alignment for Low-Resource Languages (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper