vesteinn
/

ScandiBERT

Inference Endpoints

Model card Files Files and versions Community

vesteinn commited on Sep 27, 2022

Commit

0f86e40

•

1 Parent(s): 90f4917

Updated README and model

Files changed (2) hide show

README.md +13 -0
pytorch_model.bin +2 -2

README.md CHANGED Viewed

@@ -25,6 +25,19 @@ license: agpl-3.0
 # ScandiBERT
 Note: At an earlier date a half trained model went up here, it has since been removed. The model has since been updated.
 This is a Scandinavian BERT model trained on a large collection of Danish, Faroese, Icelandic, Norwegian and Swedish text. It is currently the highest ranking model on the ScandEval leaderbord https://scandeval.github.io/pretrained/

 # ScandiBERT
+Note note: The model has been updated on 2022-09-27
+The model was trained on the data shown in the table below. Batch size was 8.8k, the model was trained for 72 epochs on 24 V100 cards for about 2 weeks.
+| Language  | Data                                  | Size   |
+|-----------|---------------------------------------|--------|
+| Icelandic | See IceBERT paper                     | 16 GB  |
+| Danish    | Danish Gigaword Corpus (incl Twitter) | 4,7 GB |
+| Norwegian | NCC corpus                            | 42 GB  |
+| Swedish   | Swedish Gigaword Corpus               | 3,4 GB |
+| Faroese   | FC3 + Sosialurinn + Bible             | 69 MB  |
 Note: At an earlier date a half trained model went up here, it has since been removed. The model has since been updated.
 This is a Scandinavian BERT model trained on a large collection of Danish, Faroese, Icelandic, Norwegian and Swedish text. It is currently the highest ranking model on the ScandEval leaderbord https://scandeval.github.io/pretrained/

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bde488996e28d79d998ea653e867da63d5ba98c0f0f2a60511b5c370b310055d
-size 498276381

 version https://git-lfs.github.com/spec/v1
+oid sha256:8e43b0598333ac79032964480c450dbd884cdba70c2349439333c86a1252ae22
+size 498276383