Update README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,31 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
-
|
4 |
|
|
|
5 |
|
|
|
|
|
|
|
|
|
6 |
|
7 |
### About ruGPT-3 XL model
|
8 |
-
Model was trained with 512 sequence length using Deepspeed and Megatron code by SberDevices team, on 80B tokens dataset for 4 epochs. After that model was finetuned 1 epoch with sequence length 2048.
|
9 |
-
Note! Model has sparse attention blocks
|
10 |
|
11 |
Total training time was around 10 days on 256 GPUs.
|
12 |
Final perplexity on test set is 12.05. Model parameters: 1.3B.
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- ru
|
4 |
+
- en
|
5 |
+
pipeline_tag: text-to-text generation
|
6 |
+
tags:
|
7 |
+
- PyTorch
|
8 |
+
- Transformers
|
9 |
+
- gpt2
|
10 |
+
- squad
|
11 |
+
- lm-head
|
12 |
+
- casual-lm
|
13 |
+
thumbnail: "https://github.com/RussianNLP/RusEnQA"
|
14 |
+
|
15 |
+
---
|
16 |
|
17 |
+
## RusEnQA
|
18 |
|
19 |
+
QA for Russian and English based on the [rugpt3xl](https://huggingface.co/sberbank-ai/rugpt3xl) model
|
20 |
|
21 |
+
####Funing-tuning format:
|
22 |
+
```
|
23 |
+
"<s>paragraph: "+eng_context+"\nlang: rus\nquestion: "+rus_question+' answer: '+ rus_answer+"</s>"
|
24 |
+
```
|
25 |
|
26 |
### About ruGPT-3 XL model
|
27 |
+
Model was trained with 512 sequence length using [Deepspeed](https://github.com/microsoft/DeepSpeed) and [Megatron](https://github.com/NVIDIA/Megatron-LM) code by [SberDevices](https://sberdevices.ru/) team, on 80B tokens dataset for 4 epochs. After that model was finetuned 1 epoch with sequence length 2048.
|
28 |
+
*Note! Model has sparse attention blocks.*
|
29 |
|
30 |
Total training time was around 10 days on 256 GPUs.
|
31 |
Final perplexity on test set is 12.05. Model parameters: 1.3B.
|