vesteinn commited on
Commit
c822432
1 Parent(s): 829462e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: is
3
+ widget:
4
+ - text: Má bjóða þér <mask> í kvöld?
5
+ - text: Forseti <mask> er ágæt.
6
+ - text: Súpan var <mask> á bragðið.
7
+ tags:
8
+ - roberta
9
+ - icelandic
10
+ - masked-lm
11
+ - pytorch
12
+ license: agpl-3.0
13
+ ---
14
+
15
+ ** We do not recommend the use of this model besides for comparison with the other IceBERT models **
16
+
17
+ # IceBERT-mC4-is
18
+
19
+ This model was trained with fairseq using the RoBERTa-base architecture. It is one of many models we have trained for Icelandic, see the paper referenced below for further details. It was trained on the Icelandic part of the mC4 dataset.
20
+
21
+ ## Scitation
22
+
23
+ The model is described in this paper [https://arxiv.org/abs/2201.05601](https://arxiv.org/abs/2201.05601). Please cite the paper if you make use of the model.
24
+
25
+ ```
26
+ @article{DBLP:journals/corr/abs-2201-05601,
27
+ author = {V{\'{e}}steinn Sn{\ae}bjarnarson and
28
+ Haukur Barri S{\'{\i}}monarson and
29
+ P{\'{e}}tur Orri Ragnarsson and
30
+ Svanhv{\'{\i}}t Lilja Ing{\'{o}}lfsd{\'{o}}ttir and
31
+ Haukur P{\'{a}}ll J{\'{o}}nsson and
32
+ Vilhj{\'{a}}lmur {\TH}orsteinsson and
33
+ Hafsteinn Einarsson},
34
+ title = {A Warm Start and a Clean Crawled Corpus - {A} Recipe for Good Language
35
+ Models},
36
+ journal = {CoRR},
37
+ volume = {abs/2201.05601},
38
+ year = {2022},
39
+ url = {https://arxiv.org/abs/2201.05601},
40
+ eprinttype = {arXiv},
41
+ eprint = {2201.05601},
42
+ timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
43
+ biburl = {https://dblp.org/rec/journals/corr/abs-2201-05601.bib},
44
+ bibsource = {dblp computer science bibliography, https://dblp.org}
45
+ }
46
+ ```