FacebookAI
/

xlm-mlm-17-1280

Inference Endpoints

Model card Files Files and versions Community

Marissa commited on Jul 7, 2022

Commit

0ad01e2

•

1 Parent(s): 632214e

Update README.md

Files changed (1) hide show

README.md +10 -3

README.md CHANGED Viewed

@@ -31,9 +31,10 @@ license: cc-by-nc-4.0
 4. [Training](#training)
 5. [Evaluation](#evaluation)
 6. [Environmental Impact](#environmental-impact)
-7. [Citation](#citation)
-8. [Model Card Authors](#model-card-authors)
-9. [How To Get Started With the Model](#how-to-get-started-with-the-model)
 # Model Details
@@ -77,6 +78,8 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 # Training
 This model is the XLM model trained on text in 17 languages. The preprocessing included tokenization and byte-pair-encoding. See the [GitHub repo](https://github.com/facebookresearch/XLM#the-17-and-100-languages) and the [associated paper](https://arxiv.org/pdf/1911.02116.pdf) for further details on the training data and training procedure.
 # Evaluation
@@ -104,6 +107,10 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 - **Compute Region:** More information needed
 - **Carbon Emitted:** More information needed
 # Citation
 **BibTeX:**

 4. [Training](#training)
 5. [Evaluation](#evaluation)
 6. [Environmental Impact](#environmental-impact)
+7. [Technical Specifications](#technical-specifications)
+8. [Citation](#citation)
+9. [Model Card Authors](#model-card-authors)
+10. [How To Get Started With the Model](#how-to-get-started-with-the-model)
 # Model Details
 # Training
 This model is the XLM model trained on text in 17 languages. The preprocessing included tokenization and byte-pair-encoding. See the [GitHub repo](https://github.com/facebookresearch/XLM#the-17-and-100-languages) and the [associated paper](https://arxiv.org/pdf/1911.02116.pdf) for further details on the training data and training procedure.
+[Conneau et al. (2020)](https://arxiv.org/pdf/1911.02116.pdf) report that this model has 16 layers, 1280 hidden states, 16 attention heads, and the dimension of the feed-forward layer is 1520. The vocabulary size is 200k and the total number of parameters is 570M (see Table 7).
 # Evaluation
 - **Compute Region:** More information needed
 - **Carbon Emitted:** More information needed
+# Technical Specifications
+[Conneau et al. (2020)](https://arxiv.org/pdf/1911.02116.pdf) report that this model has 16 layers, 1280 hidden states, 16 attention heads, and the dimension of the feed-forward layer is 1520. The vocabulary size is 200k and the total number of parameters is 570M (see Table 7).
 # Citation
 **BibTeX:**