gchhablani commited on
Commit
7e1eb68
1 Parent(s): ece1eeb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -69
README.md CHANGED
@@ -11,10 +11,9 @@ datasets:
11
 
12
  Pretrained model on English language using a masked language modeling (MLM) and next sentence prediction (NSP) objective. It was
13
  introduced in [this paper](https://arxiv.org/abs/2105.03824) and first released in [this repository](https://github.com/google-research/f_net).
14
- This model is uncased: it does not make a difference
15
- between english and English.
16
 
17
- Disclaimer: This model card has been written by [gchhablani](https://huggingface.co/gchhablani) and tehe ori
18
 
19
  ## Model description
20
 
@@ -80,72 +79,43 @@ output = model(**encoded_input)
80
 
81
  ### Limitations and bias
82
 
83
- Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
84
- predictions:
85
 
86
  ```python
87
  >>> from transformers import pipeline
88
  >>> unmasker = pipeline('fill-mask', model='bert-base-uncased')
89
  >>> unmasker("The man worked as a [MASK].")
90
 
91
- [{'sequence': '[CLS] the man worked as a carpenter. [SEP]',
92
- 'score': 0.09747550636529922,
93
- 'token': 10533,
94
- 'token_str': 'carpenter'},
95
- {'sequence': '[CLS] the man worked as a waiter. [SEP]',
96
- 'score': 0.0523831807076931,
97
- 'token': 15610,
98
- 'token_str': 'waiter'},
99
- {'sequence': '[CLS] the man worked as a barber. [SEP]',
100
- 'score': 0.04962705448269844,
101
- 'token': 13362,
102
- 'token_str': 'barber'},
103
- {'sequence': '[CLS] the man worked as a mechanic. [SEP]',
104
- 'score': 0.03788609802722931,
105
- 'token': 15893,
106
- 'token_str': 'mechanic'},
107
- {'sequence': '[CLS] the man worked as a salesman. [SEP]',
108
- 'score': 0.037680890411138535,
109
- 'token': 18968,
110
- 'token_str': 'salesman'}]
111
 
112
  >>> unmasker("The woman worked as a [MASK].")
113
 
114
- [{'sequence': '[CLS] the woman worked as a nurse. [SEP]',
115
- 'score': 0.21981462836265564,
116
- 'token': 6821,
117
- 'token_str': 'nurse'},
118
- {'sequence': '[CLS] the woman worked as a waitress. [SEP]',
119
- 'score': 0.1597415804862976,
120
- 'token': 13877,
121
- 'token_str': 'waitress'},
122
- {'sequence': '[CLS] the woman worked as a maid. [SEP]',
123
- 'score': 0.1154729500412941,
124
- 'token': 10850,
125
- 'token_str': 'maid'},
126
- {'sequence': '[CLS] the woman worked as a prostitute. [SEP]',
127
- 'score': 0.037968918681144714,
128
- 'token': 19215,
129
- 'token_str': 'prostitute'},
130
- {'sequence': '[CLS] the woman worked as a cook. [SEP]',
131
- 'score': 0.03042375110089779,
132
- 'token': 5660,
133
- 'token_str': 'cook'}]
134
  ```
135
 
136
  This bias will also affect all fine-tuned versions of this model.
137
 
138
  ## Training data
139
 
140
- The BERT model was pretrained on [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038
141
- unpublished books and [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and
142
- headers).
143
 
144
  ## Training procedure
145
 
146
  ### Preprocessing
147
 
148
- The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are
149
  then of the form:
150
 
151
  ```
@@ -166,7 +136,7 @@ The details of the masking procedure for each sentence are the following:
166
  ### Pretraining
167
 
168
  The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size
169
- of 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer
170
  used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
171
  learning rate warmup for 10,000 steps and linear decay of the learning rate after.
172
 
@@ -178,31 +148,26 @@ Glue test results:
178
 
179
  | Task | MNLI-(m/mm) | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | Average |
180
  |:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
181
- | | 84.6/83.4 | 71.2 | 90.5 | 93.5 | 52.1 | 85.8 | 88.9 | 66.4 | 79.6 |
182
 
183
 
184
  ### BibTeX entry and citation info
185
 
186
  ```bibtex
187
- @article{DBLP:journals/corr/abs-1810-04805,
188
- author = {Jacob Devlin and
189
- Ming{-}Wei Chang and
190
- Kenton Lee and
191
- Kristina Toutanova},
192
- title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
193
- Understanding},
194
  journal = {CoRR},
195
- volume = {abs/1810.04805},
196
- year = {2018},
197
- url = {http://arxiv.org/abs/1810.04805},
198
  archivePrefix = {arXiv},
199
- eprint = {1810.04805},
200
- timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
201
- biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
202
  bibsource = {dblp computer science bibliography, https://dblp.org}
203
  }
204
- ```
205
-
206
- <a href="https://huggingface.co/exbert/?model=bert-base-uncased">
207
- <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
208
- </a>
 
11
 
12
  Pretrained model on English language using a masked language modeling (MLM) and next sentence prediction (NSP) objective. It was
13
  introduced in [this paper](https://arxiv.org/abs/2105.03824) and first released in [this repository](https://github.com/google-research/f_net).
14
+ This model is uncased: it does not make a difference between english and English. The model achieves 0.58 accuracy on MLM objective and 0.80 on NSP objective.
 
15
 
16
+ Disclaimer: This model card has been written by [gchhablani](https://huggingface.co/gchhablani).
17
 
18
  ## Model description
19
 
 
79
 
80
  ### Limitations and bias
81
 
82
+ Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions. However, the model's MLM accuracy may also affect answers. Given below are some example where gender-bias could be expected:
 
83
 
84
  ```python
85
  >>> from transformers import pipeline
86
  >>> unmasker = pipeline('fill-mask', model='bert-base-uncased')
87
  >>> unmasker("The man worked as a [MASK].")
88
 
89
+ [
90
+ {"sequence": "the man worked as a man.", "score": 0.07003913819789886, "token": 283, "token_str": "man"},
91
+ {"sequence": "the man worked as a..", "score": 0.06601415574550629, "token": 16678, "token_str": "."},
92
+ {"sequence": "the man worked as a reason.", "score": 0.020491471514105797, "token": 1612, "token_str": "reason"},
93
+ {"sequence": "the man worked as a use.", "score": 0.017683615908026695, "token": 443, "token_str": "use"},
94
+ {"sequence": "the man worked as a..", "score": 0.015186904929578304, "token": 845, "token_str": "."},
95
+ ]
 
 
 
 
 
 
 
 
 
 
 
 
 
96
 
97
  >>> unmasker("The woman worked as a [MASK].")
98
 
99
+ [
100
+ {"sequence": "the woman worked as a..", "score": 0.12459157407283783, "token": 16678, "token_str": "."},
101
+ {"sequence": "the woman worked as a man.", "score": 0.022601796314120293, "token": 283, "token_str": "man"},
102
+ {"sequence": "the woman worked as a..", "score": 0.0209997296333313, "token": 845, "token_str": "."},
103
+ {"sequence": "the woman worked as a woman.", "score": 0.01911095529794693, "token": 3806, "token_str": "woman"},
104
+ {"sequence": "the woman worked as a one.", "score": 0.01739976927638054, "token": 276, "token_str": "one"},
105
+ ]
 
 
 
 
 
 
 
 
 
 
 
 
 
106
  ```
107
 
108
  This bias will also affect all fine-tuned versions of this model.
109
 
110
  ## Training data
111
 
112
+ The FNet model was pretrained on [C4](https://huggingface.co/datasets/c4), a cleaned version of the Common Crawl dataset.
 
 
113
 
114
  ## Training procedure
115
 
116
  ### Preprocessing
117
 
118
+ The texts are lowercased and tokenized using SentencePiece and a vocabulary size of 32,000. The inputs of the model are
119
  then of the form:
120
 
121
  ```
 
136
  ### Pretraining
137
 
138
  The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size
139
+ of 256. The sequence length was limited to 512 tokens. The optimizer
140
  used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
141
  learning rate warmup for 10,000 steps and linear decay of the learning rate after.
142
 
 
148
 
149
  | Task | MNLI-(m/mm) | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | Average |
150
  |:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
151
+ | | 72/73 | 84 | 80 | 95 | 69 | 79 | 76 | 63| 76.7 |
152
 
153
 
154
  ### BibTeX entry and citation info
155
 
156
  ```bibtex
157
+ @article{DBLP:journals/corr/abs-2105-03824,
158
+ author = {James Lee{-}Thorp and
159
+ Joshua Ainslie and
160
+ Ilya Eckstein and
161
+ Santiago Onta{\~{n}}{\'{o}}n},
162
+ title = {FNet: Mixing Tokens with Fourier Transforms},
 
163
  journal = {CoRR},
164
+ volume = {abs/2105.03824},
165
+ year = {2021},
166
+ url = {https://arxiv.org/abs/2105.03824},
167
  archivePrefix = {arXiv},
168
+ eprint = {2105.03824},
169
+ timestamp = {Fri, 14 May 2021 12:13:30 +0200},
170
+ biburl = {https://dblp.org/rec/journals/corr/abs-2105-03824.bib},
171
  bibsource = {dblp computer science bibliography, https://dblp.org}
172
  }
173
+ ```