dumitrescustefan commited on
Commit
8576cf5
1 Parent(s): 9718c77

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -1
README.md CHANGED
@@ -43,12 +43,39 @@ The baseline is the [Multilingual BERT](https://github.com/google-research/bert/
43
  The model is trained on the following corpora (stats in the table below are after cleaning):
44
 
45
  | Corpus | Lines(M) | Words(M) | Chars(B) | Size(GB) |
46
- |----------- |:--------: |:--------: |:--------: |:--------: |
47
  | OPUS | 55.05 | 635.04 | 4.045 | 3.8 |
48
  | OSCAR | 33.56 | 1725.82 | 11.411 | 11 |
49
  | Wikipedia | 1.54 | 60.47 | 0.411 | 0.4 |
50
  | **Total** | **90.15** | **2421.33** | **15.867** | **15.2** |
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  #### Acknowledgements
53
 
54
  - We'd like to thank [Sampo Pyysalo](https://github.com/spyysalo) from TurkuNLP for helping us out with the compute needed to pretrain the v1.0 BERT models. He's awesome!
 
43
  The model is trained on the following corpora (stats in the table below are after cleaning):
44
 
45
  | Corpus | Lines(M) | Words(M) | Chars(B) | Size(GB) |
46
+ |-----------|:--------:|:--------:|:--------:|:--------:|
47
  | OPUS | 55.05 | 635.04 | 4.045 | 3.8 |
48
  | OSCAR | 33.56 | 1725.82 | 11.411 | 11 |
49
  | Wikipedia | 1.54 | 60.47 | 0.411 | 0.4 |
50
  | **Total** | **90.15** | **2421.33** | **15.867** | **15.2** |
51
 
52
+ ### Citation
53
+
54
+ If you use this model in a research paper, I'd kindly ask you to cite the following paper:
55
+
56
+ ```
57
+ Stefan Dumitrescu, Andrei-Marius Avram, and Sampo Pyysalo. 2020. The birth of Romanian BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4324–4328, Online. Association for Computational Linguistics.
58
+ ```
59
+
60
+ or, in bibtex:
61
+
62
+ ```
63
+ @inproceedings{dumitrescu-etal-2020-birth,
64
+ title = "The birth of {R}omanian {BERT}",
65
+ author = "Dumitrescu, Stefan and
66
+ Avram, Andrei-Marius and
67
+ Pyysalo, Sampo",
68
+ booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
69
+ month = nov,
70
+ year = "2020",
71
+ address = "Online",
72
+ publisher = "Association for Computational Linguistics",
73
+ url = "https://aclanthology.org/2020.findings-emnlp.387",
74
+ doi = "10.18653/v1/2020.findings-emnlp.387",
75
+ pages = "4324--4328",
76
+ }
77
+ ```
78
+
79
  #### Acknowledgements
80
 
81
  - We'd like to thank [Sampo Pyysalo](https://github.com/spyysalo) from TurkuNLP for helping us out with the compute needed to pretrain the v1.0 BERT models. He's awesome!