fix typos
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ license: apache-2.0
|
|
20 |
| Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
|
21 |
| :------: | :---: | :---: | :---: | :---: |
|
22 |
| `gpt-fr-cased-small` | 12 | 12 | 768 | 124 M |
|
23 |
-
| `gpt-fr-cased-base` | 24 | 14 |
|
24 |
|
25 |
## Intended uses & limitations
|
26 |
|
@@ -70,7 +70,7 @@ We created a dedicated corpus to train our generative model. Indeed the model us
|
|
70 |
|
71 |
## Training procedure
|
72 |
|
73 |
-
We pre-trained the model on the new CNRS (French National Centre for Scientific Research) [Jean Zay](http://www.idris.fr/eng/jean-zay/) supercomputer. We perform the training within a total of 140 hours of computation on Tesla V-100 hardware (TDP of 300W). The training was distributed on 4 compute nodes of 8 GPUs. We used data parallelization in order to divide each micro-batch on the computing units. We estimated the total emissions at 580.61 kgCO2eq, using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in
|
74 |
|
75 |
## Eval results
|
76 |
|
@@ -79,8 +79,8 @@ In line with the [WikiText](https://blog.einstein.ai/the-wikitext-long-term-depe
|
|
79 |
|
80 |
### BibTeX entry and citation info
|
81 |
|
82 |
-
Along with the model
|
83 |
-
If you use **GPT-fr** for your scientific
|
84 |
|
85 |
```bibtex
|
86 |
@inproceedings{simoulin_2020_gptfr,
|
|
|
20 |
| Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
|
21 |
| :------: | :---: | :---: | :---: | :---: |
|
22 |
| `gpt-fr-cased-small` | 12 | 12 | 768 | 124 M |
|
23 |
+
| `gpt-fr-cased-base` | 24 | 14 | 1,792 | 1,017 B |
|
24 |
|
25 |
## Intended uses & limitations
|
26 |
|
|
|
70 |
|
71 |
## Training procedure
|
72 |
|
73 |
+
We pre-trained the model on the new CNRS (French National Centre for Scientific Research) [Jean Zay](http://www.idris.fr/eng/jean-zay/) supercomputer. We perform the training within a total of 140 hours of computation on Tesla V-100 hardware (TDP of 300W). The training was distributed on 4 compute nodes of 8 GPUs. We used data parallelization in order to divide each micro-batch on the computing units. We estimated the total emissions at 580.61 kgCO2eq, using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al., (2019)](lacoste-2019).
|
74 |
|
75 |
## Eval results
|
76 |
|
|
|
79 |
|
80 |
### BibTeX entry and citation info
|
81 |
|
82 |
+
Along with the model hosted by HuggingFace transformers library, we maintain a [git repository](https://github.com/AntoineSimoulin/gpt-fr).
|
83 |
+
If you use **GPT-fr** for your scientific publications or your industrial applications, please cite the following paper:
|
84 |
|
85 |
```bibtex
|
86 |
@inproceedings{simoulin_2020_gptfr,
|