sgugger Marissa commited on
Commit
c114932
·
1 Parent(s): ec58a5b

Update model card (#1)

Browse files

- Update model card (ae7a275b1e90a24f2e25105eaeea6a2015953a45)


Co-authored-by: Marissa Gerchick <Marissa@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +142 -7
README.md CHANGED
@@ -8,7 +8,22 @@ datasets:
8
  - openwebtext
9
  ---
10
 
11
- # DistilRoBERTa base model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  This model is a distilled version of the [RoBERTa-base model](https://huggingface.co/roberta-base). It follows the same training procedure as [DistilBERT](https://huggingface.co/distilbert-base-uncased).
14
  The code for the distillation process can be found [here](https://github.com/huggingface/transformers/tree/master/examples/distillation).
@@ -17,15 +32,92 @@ This model is case-sensitive: it makes a difference between english and English.
17
  The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base).
18
  On average DistilRoBERTa is twice as fast as Roberta-base.
19
 
20
- We encourage to check [RoBERTa-base model](https://huggingface.co/roberta-base) to know more about usage, limitations and potential biases.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
- ## Training data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
- DistilRoBERTa was pre-trained on [OpenWebTextCorpus](https://skylion007.github.io/OpenWebTextCorpus/), a reproduction of OpenAI's WebText dataset (it is ~4 times less training data than the teacher RoBERTa).
25
 
26
- ## Evaluation results
27
 
28
- When fine-tuned on downstream tasks, this model achieves the following results:
 
 
29
 
30
  Glue test results:
31
 
@@ -33,7 +125,17 @@ Glue test results:
33
  |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|
34
  | | 84.0 | 89.4 | 90.8 | 92.5 | 59.3 | 88.3 | 86.6 | 67.9 |
35
 
36
- ### BibTeX entry and citation info
 
 
 
 
 
 
 
 
 
 
37
 
38
  ```bibtex
39
  @article{Sanh2019DistilBERTAD,
@@ -45,6 +147,39 @@ Glue test results:
45
  }
46
  ```
47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  <a href="https://huggingface.co/exbert/?model=distilroberta-base">
49
  <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
50
  </a>
 
8
  - openwebtext
9
  ---
10
 
11
+ # Model Card for DistilRoBERTa base
12
+
13
+ # Table of Contents
14
+
15
+ 1. [Model Details](#model-details)
16
+ 2. [Uses](#uses)
17
+ 3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
18
+ 4. [Training Details](#training-details)
19
+ 5. [Evaluation](#evaluation)
20
+ 6. [Environmental Impact](#environmental-impact)
21
+ 7. [Citation](#citation)
22
+ 8. [How To Get Started With the Model](#how-to-get-started-with-the-model)
23
+
24
+ # Model Details
25
+
26
+ ## Model Description
27
 
28
  This model is a distilled version of the [RoBERTa-base model](https://huggingface.co/roberta-base). It follows the same training procedure as [DistilBERT](https://huggingface.co/distilbert-base-uncased).
29
  The code for the distillation process can be found [here](https://github.com/huggingface/transformers/tree/master/examples/distillation).
 
32
  The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base).
33
  On average DistilRoBERTa is twice as fast as Roberta-base.
34
 
35
+ We encourage users of this model card to check out the [RoBERTa-base model card](https://huggingface.co/roberta-base) to learn more about usage, limitations and potential biases.
36
+
37
+ - **Developed by:** Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf (Hugging Face)
38
+ - **Model type:** Transformer-based language model
39
+ - **Language(s) (NLP):** English
40
+ - **License:** Apache 2.0
41
+ - **Related Models:** [RoBERTa-base model card](https://huggingface.co/roberta-base)
42
+ - **Resources for more information:**
43
+ - [GitHub Repository](https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)
44
+ - [Associated Paper](https://arxiv.org/abs/1910.01108)
45
+
46
+ # Uses
47
+
48
+ ## Direct Use and Downstream Use
49
+
50
+ You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=roberta) to look for fine-tuned versions on a task that interests you.
51
+
52
+ Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2.
53
+
54
+ ## Out of Scope Use
55
+
56
+ The model should not be used to intentionally create hostile or alienating environments for people. The model was not trained to be factual or true representations of people or events, and therefore using the models to generate such content is out-of-scope for the abilities of this model.
57
+
58
+ # Bias, Risks, and Limitations
59
+
60
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:
61
 
62
+ ```python
63
+ >>> from transformers import pipeline
64
+ >>> unmasker = pipeline('fill-mask', model='distilroberta-base')
65
+ >>> unmasker("The man worked as a <mask>.")
66
+ [{'score': 0.1237526461482048,
67
+ 'sequence': 'The man worked as a waiter.',
68
+ 'token': 38233,
69
+ 'token_str': ' waiter'},
70
+ {'score': 0.08968018740415573,
71
+ 'sequence': 'The man worked as a waitress.',
72
+ 'token': 35698,
73
+ 'token_str': ' waitress'},
74
+ {'score': 0.08387645334005356,
75
+ 'sequence': 'The man worked as a bartender.',
76
+ 'token': 33080,
77
+ 'token_str': ' bartender'},
78
+ {'score': 0.061059024184942245,
79
+ 'sequence': 'The man worked as a mechanic.',
80
+ 'token': 25682,
81
+ 'token_str': ' mechanic'},
82
+ {'score': 0.03804653510451317,
83
+ 'sequence': 'The man worked as a courier.',
84
+ 'token': 37171,
85
+ 'token_str': ' courier'}]
86
+
87
+ >>> unmasker("The woman worked as a <mask>.")
88
+ [{'score': 0.23149248957633972,
89
+ 'sequence': 'The woman worked as a waitress.',
90
+ 'token': 35698,
91
+ 'token_str': ' waitress'},
92
+ {'score': 0.07563332468271255,
93
+ 'sequence': 'The woman worked as a waiter.',
94
+ 'token': 38233,
95
+ 'token_str': ' waiter'},
96
+ {'score': 0.06983394920825958,
97
+ 'sequence': 'The woman worked as a bartender.',
98
+ 'token': 33080,
99
+ 'token_str': ' bartender'},
100
+ {'score': 0.05411609262228012,
101
+ 'sequence': 'The woman worked as a nurse.',
102
+ 'token': 9008,
103
+ 'token_str': ' nurse'},
104
+ {'score': 0.04995106905698776,
105
+ 'sequence': 'The woman worked as a maid.',
106
+ 'token': 29754,
107
+ 'token_str': ' maid'}]
108
+ ```
109
+
110
+ ## Recommendations
111
+
112
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
113
 
114
+ # Training Details
115
 
116
+ DistilRoBERTa was pre-trained on [OpenWebTextCorpus](https://skylion007.github.io/OpenWebTextCorpus/), a reproduction of OpenAI's WebText dataset (it is ~4 times less training data than the teacher RoBERTa). See the [roberta-base model card](https://huggingface.co/roberta-base/blob/main/README.md) for further details on training.
117
 
118
+ # Evaluation
119
+
120
+ When fine-tuned on downstream tasks, this model achieves the following results (see [GitHub Repo](https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)):
121
 
122
  Glue test results:
123
 
 
125
  |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|
126
  | | 84.0 | 89.4 | 90.8 | 92.5 | 59.3 | 88.3 | 86.6 | 67.9 |
127
 
128
+ # Environmental Impact
129
+
130
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
131
+
132
+ - **Hardware Type:** More information needed
133
+ - **Hours used:** More information needed
134
+ - **Cloud Provider:** More information needed
135
+ - **Compute Region:** More information needed
136
+ - **Carbon Emitted:** More information needed
137
+
138
+ # Citation
139
 
140
  ```bibtex
141
  @article{Sanh2019DistilBERTAD,
 
147
  }
148
  ```
149
 
150
+ APA
151
+ - Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
152
+
153
+ # How to Get Started With the Model
154
+
155
+ You can use the model directly with a pipeline for masked language modeling:
156
+
157
+ ```python
158
+ >>> from transformers import pipeline
159
+ >>> unmasker = pipeline('fill-mask', model='distilroberta-base')
160
+ >>> unmasker("Hello I'm a <mask> model.")
161
+ [{'score': 0.04673689603805542,
162
+ 'sequence': "Hello I'm a business model.",
163
+ 'token': 265,
164
+ 'token_str': ' business'},
165
+ {'score': 0.03846118599176407,
166
+ 'sequence': "Hello I'm a freelance model.",
167
+ 'token': 18150,
168
+ 'token_str': ' freelance'},
169
+ {'score': 0.03308931365609169,
170
+ 'sequence': "Hello I'm a fashion model.",
171
+ 'token': 2734,
172
+ 'token_str': ' fashion'},
173
+ {'score': 0.03018997237086296,
174
+ 'sequence': "Hello I'm a role model.",
175
+ 'token': 774,
176
+ 'token_str': ' role'},
177
+ {'score': 0.02111748233437538,
178
+ 'sequence': "Hello I'm a Playboy model.",
179
+ 'token': 24526,
180
+ 'token_str': ' Playboy'}]
181
+ ```
182
+
183
  <a href="https://huggingface.co/exbert/?model=distilroberta-base">
184
  <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
185
  </a>