File size: 2,612 Bytes
e88c193
ca13cba
bb548ac
e88c193
71adb80
4c1e43e
 
455b203
bb548ac
5127871
d812072
 
d203043
5127871
bb548ac
2b6a9d8
bb548ac
 
 
 
77da23e
bb548ac
2b6a9d8
bb548ac
2b6a9d8
 
bb548ac
 
5127871
bb548ac
5127871
bb548ac
5127871
bb548ac
5127871
bb548ac
5127871
 
bb548ac
 
 
 
 
 
 
 
 
 
 
 
 
264e1ba
1f5f0c4
455b203
 
 
 
bb548ac
455b203
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bb548ac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
language:
- ru
tags:
- mbart
inference:
  parameters:
    no_repeat_ngram_size: 4,
    num_beams: 5
datasets:
- IlyaGusev/gazeta
- samsum
- samsum_(translated_into_Russian)
widget:
- text: >
    Джефф: Могу ли я обучить модель 🤗 Transformers на Amazon SageMaker? 

    Филипп: Конечно, вы можете использовать новый контейнер для глубокого
    обучения HuggingFace. 

    Джефф: Хорошо.

    Джефф: и как я могу начать? 

    Джефф: где я могу найти документацию? 

    Филипп: ок, ок, здесь можно найти все:
    https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face
model-index:
- name: mbart_ruDialogSum
  results:
  - task:
      name: Abstractive Dialogue Summarization
      type: abstractive-text-summarization
    dataset:
      name: SAMSum Corpus (translated to Russian)
      type: samsum
    metrics:
    - name: Validation ROGUE-1
      type: rogue-1
      value: 34.5
    - name: Validation ROGUE-L
      type: rogue-l
      value: 33
    - name: Test ROGUE-1
      type: rogue-1
      value: 31
    - name: Test ROGUE-L
      type: rogue-l
      value: 28
license: cc
---
### 📝 Description

MBart for Russian summarization fine-tuned for **dialogues** summarization.


This model was firstly fine-tuned by [Ilya Gusev](https://hf.co/IlyaGusev) on [Gazeta dataset](https://huggingface.co/datasets/IlyaGusev/gazeta). We have **fine tuned** that model on [SamSum dataset](https://huggingface.co/datasets/samsum) **translated to Russian** using GoogleTranslateAPI

🤗 Moreover! We have implemented a **! telegram bot [@summarization_bot](https://t.me/summarization_bot) !** with the inference of this model. Add it to the chat and get summaries instead of dozens spam messages!  🤗


### ❓ How to use with code
```python
from transformers import MBartTokenizer, MBartForConditionalGeneration

# Download model and tokenizer
model_name = "Kirili4ik/mbart_ruDialogSum"   
tokenizer =  AutoTokenizer.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)
model.eval()

article_text = "..."

input_ids = tokenizer(
    [article_text],
    max_length=600,
    padding="max_length",
    truncation=True,
    return_tensors="pt",
)["input_ids"]

output_ids = model.generate(
    input_ids=input_ids,
    top_k=0,
    num_beams=3,
    no_repeat_ngram_size=3
)[0]


summary = tokenizer.decode(output_ids, skip_special_tokens=True)
print(summary)
```