asi
/

gpt-fr-cased-base

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

asi commited on May 11, 2021

Commit

33320f0

•

1 Parent(s): ffb62ef

fix typos

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ license: apache-2.0
 | Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
 | :------:       |   :---: | :---: | :---: | :---: |
 | `gpt-fr-cased-small` | 12    | 12    | 768   | 124 M |
-| `gpt-fr-cased-base` | 24    | 14    | 1792   | 1,017 B |
 ## Intended uses & limitations
@@ -70,7 +70,7 @@ We created a dedicated corpus to train our generative model. Indeed the model us
 ## Training procedure
-We pre-trained the model on the new CNRS (French National Centre for Scientific Research) [Jean Zay](http://www.idris.fr/eng/jean-zay/) supercomputer. We perform the training within a total of 140 hours of computation on Tesla V-100 hardware (TDP of 300W). The training was distributed on 4 compute nodes of 8 GPUs. We used data parallelization in order to divide each micro-batch on the computing units. We estimated the total emissions at 580.61 kgCO2eq, using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in ([Lacoste et al., 2019](lacoste-2019)).
 ## Eval results
@@ -79,8 +79,8 @@ In line with the [WikiText](https://blog.einstein.ai/the-wikitext-long-term-depe
 ### BibTeX entry and citation info
-Along with the model host by HuggingFace transformers library, we maintain a [git repository](https://github.com/AntoineSimoulin/gpt-fr).
-If you use **GPT-fr** for your scientific publication or your industrial applications, please cite the following paper:
 ```bibtex
 @inproceedings{simoulin_2020_gptfr,

 | Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
 | :------:       |   :---: | :---: | :---: | :---: |
 | `gpt-fr-cased-small` | 12    | 12    | 768   | 124 M |
+| `gpt-fr-cased-base` | 24    | 14    | 1,792   | 1,017 B |
 ## Intended uses & limitations
 ## Training procedure
+We pre-trained the model on the new CNRS (French National Centre for Scientific Research) [Jean Zay](http://www.idris.fr/eng/jean-zay/) supercomputer. We perform the training within a total of 140 hours of computation on Tesla V-100 hardware (TDP of 300W). The training was distributed on 4 compute nodes of 8 GPUs. We used data parallelization in order to divide each micro-batch on the computing units. We estimated the total emissions at 580.61 kgCO2eq, using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al., (2019)](lacoste-2019).
 ## Eval results
 ### BibTeX entry and citation info
+Along with the model hosted by HuggingFace transformers library, we maintain a [git repository](https://github.com/AntoineSimoulin/gpt-fr).
+If you use **GPT-fr** for your scientific publications or your industrial applications, please cite the following paper:
 ```bibtex
 @inproceedings{simoulin_2020_gptfr,