Updated README and model
Browse files- README.md +13 -0
- pytorch_model.bin +2 -2
README.md
CHANGED
@@ -25,6 +25,19 @@ license: agpl-3.0
|
|
25 |
|
26 |
# ScandiBERT
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
Note: At an earlier date a half trained model went up here, it has since been removed. The model has since been updated.
|
29 |
|
30 |
This is a Scandinavian BERT model trained on a large collection of Danish, Faroese, Icelandic, Norwegian and Swedish text. It is currently the highest ranking model on the ScandEval leaderbord https://scandeval.github.io/pretrained/
|
|
|
25 |
|
26 |
# ScandiBERT
|
27 |
|
28 |
+
Note note: The model has been updated on 2022-09-27
|
29 |
+
|
30 |
+
The model was trained on the data shown in the table below. Batch size was 8.8k, the model was trained for 72 epochs on 24 V100 cards for about 2 weeks.
|
31 |
+
|
32 |
+
| Language | Data | Size |
|
33 |
+
|-----------|---------------------------------------|--------|
|
34 |
+
| Icelandic | See IceBERT paper | 16 GB |
|
35 |
+
| Danish | Danish Gigaword Corpus (incl Twitter) | 4,7 GB |
|
36 |
+
| Norwegian | NCC corpus | 42 GB |
|
37 |
+
| Swedish | Swedish Gigaword Corpus | 3,4 GB |
|
38 |
+
| Faroese | FC3 + Sosialurinn + Bible | 69 MB |
|
39 |
+
|
40 |
+
|
41 |
Note: At an earlier date a half trained model went up here, it has since been removed. The model has since been updated.
|
42 |
|
43 |
This is a Scandinavian BERT model trained on a large collection of Danish, Faroese, Icelandic, Norwegian and Swedish text. It is currently the highest ranking model on the ScandEval leaderbord https://scandeval.github.io/pretrained/
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8e43b0598333ac79032964480c450dbd884cdba70c2349439333c86a1252ae22
|
3 |
+
size 498276383
|