Update README.md
#5
by
typowy-93
- opened
README.md
CHANGED
@@ -62,7 +62,6 @@ language:
|
|
62 |
- et
|
63 |
- fi
|
64 |
- hu
|
65 |
-
|
66 |
pipeline_tag: text-generation
|
67 |
tags:
|
68 |
- multilingual
|
@@ -75,7 +74,7 @@ tags:
|
|
75 |
datasets:
|
76 |
- mc4
|
77 |
- wikipedia
|
78 |
-
thumbnail:
|
79 |
---
|
80 |
|
81 |
# Multilingual GPT model
|
@@ -140,4 +139,4 @@ Languages:
|
|
140 |
## Details
|
141 |
The model was trained with sequence length 512 using Megatron and Deepspeed libs by [SberDevices](https://sberdevices.ru/) team on a dataset of 600 GB of texts in 61 languages. The model has seen 440 billion BPE tokens in total.
|
142 |
|
143 |
-
Total training time was around 14 days on 256 Nvidia V100 GPUs.
|
|
|
62 |
- et
|
63 |
- fi
|
64 |
- hu
|
|
|
65 |
pipeline_tag: text-generation
|
66 |
tags:
|
67 |
- multilingual
|
|
|
74 |
datasets:
|
75 |
- mc4
|
76 |
- wikipedia
|
77 |
+
thumbnail: https://github.com/sberbank-ai/mgpt
|
78 |
---
|
79 |
|
80 |
# Multilingual GPT model
|
|
|
139 |
## Details
|
140 |
The model was trained with sequence length 512 using Megatron and Deepspeed libs by [SberDevices](https://sberdevices.ru/) team on a dataset of 600 GB of texts in 61 languages. The model has seen 440 billion BPE tokens in total.
|
141 |
|
142 |
+
Total training time was around 14 days on 256 Nvidia V100 GPUs.
|