|
--- |
|
license: cc-by-4.0 |
|
language: |
|
- it |
|
--- |
|
# GeNTE Evaluator |
|
|
|
The **Gender-Neutral Translation (GeNTE) Evaluator** is a sequence classification model used for evaluating inclusive translations into Italian for the [GeNTE corpus](https://huggingface.co/datasets/FBK-MT/GeNTE). |
|
It is built by fine-tuning the RoBERTa-based [UmBERTo model](https://huggingface.co/Musixmatch/umberto-wikipedia-uncased-v1). |
|
More details on the training process and the reproducibility can be found in the [official repository](https://github.com/hlt-mt/fbk-NEUTR-evAL/blob/main/solutions/GeNTE.md) ad the [paper](https://aclanthology.org/2023.emnlp-main.873/). |
|
|
|
## Usage |
|
|
|
You can use the GeNTE Evaluator as follows: |
|
|
|
``` |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
# load the tokenizer of UmBERTo |
|
tokenizer = AutoTokenizer.from_pretrained("Musixmatch/umberto-wikipedia-uncased-v1", do_lower_case=False) |
|
|
|
# load GeNTE Evaluator |
|
model = AutoModelForSequenceClassification.from_pretrained("FBK-MT/GeNTE-evaluator") |
|
|
|
# neutral example |
|
sample = "Condividiamo il parere di chi ha presentato la relazione che ha posto notevole enfasi sull'informazione in relazione ai rischi e sulla trasparenza, in particolare nel campo sanitario e della sicurezza." |
|
input = tokenizer(sample, return_tensors='pt') |
|
|
|
with torch.no_grad(): |
|
probs = model(**input).logits |
|
|
|
predicted_label = torch.argmax(probs, dim=1).item() |
|
print(predicted_label) # 0 is neutral, 1 is gendered |
|
``` |
|
|
|
## Citation |
|
|
|
``` |
|
@inproceedings{piergentili-etal-2023-hi, |
|
title = "Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the {G}e{NTE} Corpus", |
|
author = "Piergentili, Andrea and |
|
Savoldi, Beatrice and |
|
Fucci, Dennis and |
|
Negri, Matteo and |
|
Bentivogli, Luisa", |
|
editor = "Bouamor, Houda and |
|
Pino, Juan and |
|
Bali, Kalika", |
|
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing", |
|
month = dec, |
|
year = "2023", |
|
address = "Singapore", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2023.emnlp-main.873", |
|
doi = "10.18653/v1/2023.emnlp-main.873", |
|
pages = "14124--14140", |
|
abstract = "Gender inequality is embedded in our communication practices and perpetuated in translation technologies. This becomes particularly apparent when translating into grammatical gender languages, where machine translation (MT) often defaults to masculine and stereotypical representations by making undue binary gender assumptions. Our work addresses the rising demand for inclusive language by focusing head-on on gender-neutral translation from English to Italian. We start from the essentials: proposing a dedicated benchmark and exploring automated evaluation methods. First, we introduce GeNTE, a natural, bilingual test set for gender-neutral translation, whose creation was informed by a survey on the perception and use of neutral language. Based on GeNTE, we then overview existing reference-based evaluation approaches, highlight their limits, and propose a reference-free method more suitable to assess gender-neutral translation.", |
|
} |
|
``` |
|
|
|
## Contributions |
|
|
|
Thanks to [@dfucci](https://huggingface.co/dfucci) for adding this model. |