vesteinn commited on
Commit
0f86e40
1 Parent(s): 90f4917

Updated README and model

Browse files
Files changed (2) hide show
  1. README.md +13 -0
  2. pytorch_model.bin +2 -2
README.md CHANGED
@@ -25,6 +25,19 @@ license: agpl-3.0
25
 
26
  # ScandiBERT
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  Note: At an earlier date a half trained model went up here, it has since been removed. The model has since been updated.
29
 
30
  This is a Scandinavian BERT model trained on a large collection of Danish, Faroese, Icelandic, Norwegian and Swedish text. It is currently the highest ranking model on the ScandEval leaderbord https://scandeval.github.io/pretrained/
 
25
 
26
  # ScandiBERT
27
 
28
+ Note note: The model has been updated on 2022-09-27
29
+
30
+ The model was trained on the data shown in the table below. Batch size was 8.8k, the model was trained for 72 epochs on 24 V100 cards for about 2 weeks.
31
+
32
+ | Language | Data | Size |
33
+ |-----------|---------------------------------------|--------|
34
+ | Icelandic | See IceBERT paper | 16 GB |
35
+ | Danish | Danish Gigaword Corpus (incl Twitter) | 4,7 GB |
36
+ | Norwegian | NCC corpus | 42 GB |
37
+ | Swedish | Swedish Gigaword Corpus | 3,4 GB |
38
+ | Faroese | FC3 + Sosialurinn + Bible | 69 MB |
39
+
40
+
41
  Note: At an earlier date a half trained model went up here, it has since been removed. The model has since been updated.
42
 
43
  This is a Scandinavian BERT model trained on a large collection of Danish, Faroese, Icelandic, Norwegian and Swedish text. It is currently the highest ranking model on the ScandEval leaderbord https://scandeval.github.io/pretrained/
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bde488996e28d79d998ea653e867da63d5ba98c0f0f2a60511b5c370b310055d
3
- size 498276381
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e43b0598333ac79032964480c450dbd884cdba70c2349439333c86a1252ae22
3
+ size 498276383