s-nlp
/

mbart-detox-en-ru

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

etomoscow commited on Sep 20, 2023

Commit

ea1823e

•

1 Parent(s): 52f1846

Update README.md

Files changed (1) hide show

README.md +53 -1

README.md CHANGED Viewed

@@ -6,4 +6,56 @@ language:
 - en
 library_name: transformers
 pipeline_tag: text2text-generation
----

 - en
 library_name: transformers
 pipeline_tag: text2text-generation
+---
+## Model Description
+This is the model presented in the paper "Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification".
+The model is based on [mBART-large-50](https://huggingface.co/facebook/mbart-large-50) and trained on two parallel detoxification corpora: [ParaDetox](https://huggingface.co/datasets/s-nlp/paradetox) and [RuDetox](https://github.com/s-nlp/russe_detox_2022/tree/main/data). More details about this model are in the paper.
+## Usage
+1. Model loading.
+```python
+from transformers import MBartForConditionalGeneration, AutoTokenizer
+model = MBartForConditionalGeneration.from_pretrained("s-nlp/mBART_EN_RU").cuda()
+tokenizer = AutoTokenizer.from_pretrained("facebook/mbart-large-50")
+```
+2. Detoxification utility.
+```python
+def paraphrase(text, model, tokenizer, n=None, max_length="auto", beams=3):
+    texts = [text] if isinstance(text, str) else text
+    inputs = tokenizer(texts, return_tensors="pt", padding=True)["input_ids"].to(
+        model.device
+    )
+    if max_length == "auto":
+        max_length = inputs.shape[1] + 10
+    result = model.generate(
+        inputs,
+        num_return_sequences=n or 1,
+        do_sample=True,
+        temperature=1.0,
+        repetition_penalty=10.0,
+        max_length=max_length,
+        min_length=int(0.5 * max_length),
+        num_beams=beams,
+        forced_bos_token_id=tokenizer.lang_code_to_id[tokenizer.tgt_lang]
+    )
+    texts = [tokenizer.decode(r, skip_special_tokens=True) for r in result]
+    if not n and isinstance(text, str):
+        return texts[0]
+    return texts
+```
+## Citation
+TBD