Marissa commited on
Commit
0ad01e2
1 Parent(s): 632214e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -31,9 +31,10 @@ license: cc-by-nc-4.0
31
  4. [Training](#training)
32
  5. [Evaluation](#evaluation)
33
  6. [Environmental Impact](#environmental-impact)
34
- 7. [Citation](#citation)
35
- 8. [Model Card Authors](#model-card-authors)
36
- 9. [How To Get Started With the Model](#how-to-get-started-with-the-model)
 
37
 
38
 
39
  # Model Details
@@ -77,6 +78,8 @@ Users (both direct and downstream) should be made aware of the risks, biases and
77
  # Training
78
 
79
  This model is the XLM model trained on text in 17 languages. The preprocessing included tokenization and byte-pair-encoding. See the [GitHub repo](https://github.com/facebookresearch/XLM#the-17-and-100-languages) and the [associated paper](https://arxiv.org/pdf/1911.02116.pdf) for further details on the training data and training procedure.
 
 
80
 
81
  # Evaluation
82
 
@@ -104,6 +107,10 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
104
  - **Compute Region:** More information needed
105
  - **Carbon Emitted:** More information needed
106
 
 
 
 
 
107
  # Citation
108
 
109
  **BibTeX:**
 
31
  4. [Training](#training)
32
  5. [Evaluation](#evaluation)
33
  6. [Environmental Impact](#environmental-impact)
34
+ 7. [Technical Specifications](#technical-specifications)
35
+ 8. [Citation](#citation)
36
+ 9. [Model Card Authors](#model-card-authors)
37
+ 10. [How To Get Started With the Model](#how-to-get-started-with-the-model)
38
 
39
 
40
  # Model Details
 
78
  # Training
79
 
80
  This model is the XLM model trained on text in 17 languages. The preprocessing included tokenization and byte-pair-encoding. See the [GitHub repo](https://github.com/facebookresearch/XLM#the-17-and-100-languages) and the [associated paper](https://arxiv.org/pdf/1911.02116.pdf) for further details on the training data and training procedure.
81
+
82
+ [Conneau et al. (2020)](https://arxiv.org/pdf/1911.02116.pdf) report that this model has 16 layers, 1280 hidden states, 16 attention heads, and the dimension of the feed-forward layer is 1520. The vocabulary size is 200k and the total number of parameters is 570M (see Table 7).
83
 
84
  # Evaluation
85
 
 
107
  - **Compute Region:** More information needed
108
  - **Carbon Emitted:** More information needed
109
 
110
+ # Technical Specifications
111
+
112
+ [Conneau et al. (2020)](https://arxiv.org/pdf/1911.02116.pdf) report that this model has 16 layers, 1280 hidden states, 16 attention heads, and the dimension of the feed-forward layer is 1520. The vocabulary size is 200k and the total number of parameters is 570M (see Table 7).
113
+
114
  # Citation
115
 
116
  **BibTeX:**