gchhablani
commited on
Commit
•
7e1eb68
1
Parent(s):
ece1eeb
Update README.md
Browse files
README.md
CHANGED
@@ -11,10 +11,9 @@ datasets:
|
|
11 |
|
12 |
Pretrained model on English language using a masked language modeling (MLM) and next sentence prediction (NSP) objective. It was
|
13 |
introduced in [this paper](https://arxiv.org/abs/2105.03824) and first released in [this repository](https://github.com/google-research/f_net).
|
14 |
-
This model is uncased: it does not make a difference
|
15 |
-
between english and English.
|
16 |
|
17 |
-
Disclaimer: This model card has been written by [gchhablani](https://huggingface.co/gchhablani)
|
18 |
|
19 |
## Model description
|
20 |
|
@@ -80,72 +79,43 @@ output = model(**encoded_input)
|
|
80 |
|
81 |
### Limitations and bias
|
82 |
|
83 |
-
Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
|
84 |
-
predictions:
|
85 |
|
86 |
```python
|
87 |
>>> from transformers import pipeline
|
88 |
>>> unmasker = pipeline('fill-mask', model='bert-base-uncased')
|
89 |
>>> unmasker("The man worked as a [MASK].")
|
90 |
|
91 |
-
[
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
'token_str': 'waiter'},
|
99 |
-
{'sequence': '[CLS] the man worked as a barber. [SEP]',
|
100 |
-
'score': 0.04962705448269844,
|
101 |
-
'token': 13362,
|
102 |
-
'token_str': 'barber'},
|
103 |
-
{'sequence': '[CLS] the man worked as a mechanic. [SEP]',
|
104 |
-
'score': 0.03788609802722931,
|
105 |
-
'token': 15893,
|
106 |
-
'token_str': 'mechanic'},
|
107 |
-
{'sequence': '[CLS] the man worked as a salesman. [SEP]',
|
108 |
-
'score': 0.037680890411138535,
|
109 |
-
'token': 18968,
|
110 |
-
'token_str': 'salesman'}]
|
111 |
|
112 |
>>> unmasker("The woman worked as a [MASK].")
|
113 |
|
114 |
-
[
|
115 |
-
|
116 |
-
|
117 |
-
|
118 |
-
|
119 |
-
|
120 |
-
|
121 |
-
'token_str': 'waitress'},
|
122 |
-
{'sequence': '[CLS] the woman worked as a maid. [SEP]',
|
123 |
-
'score': 0.1154729500412941,
|
124 |
-
'token': 10850,
|
125 |
-
'token_str': 'maid'},
|
126 |
-
{'sequence': '[CLS] the woman worked as a prostitute. [SEP]',
|
127 |
-
'score': 0.037968918681144714,
|
128 |
-
'token': 19215,
|
129 |
-
'token_str': 'prostitute'},
|
130 |
-
{'sequence': '[CLS] the woman worked as a cook. [SEP]',
|
131 |
-
'score': 0.03042375110089779,
|
132 |
-
'token': 5660,
|
133 |
-
'token_str': 'cook'}]
|
134 |
```
|
135 |
|
136 |
This bias will also affect all fine-tuned versions of this model.
|
137 |
|
138 |
## Training data
|
139 |
|
140 |
-
The
|
141 |
-
unpublished books and [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and
|
142 |
-
headers).
|
143 |
|
144 |
## Training procedure
|
145 |
|
146 |
### Preprocessing
|
147 |
|
148 |
-
The texts are lowercased and tokenized using
|
149 |
then of the form:
|
150 |
|
151 |
```
|
@@ -166,7 +136,7 @@ The details of the masking procedure for each sentence are the following:
|
|
166 |
### Pretraining
|
167 |
|
168 |
The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size
|
169 |
-
of 256. The sequence length was limited to
|
170 |
used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
|
171 |
learning rate warmup for 10,000 steps and linear decay of the learning rate after.
|
172 |
|
@@ -178,31 +148,26 @@ Glue test results:
|
|
178 |
|
179 |
| Task | MNLI-(m/mm) | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | Average |
|
180 |
|:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
|
181 |
-
| |
|
182 |
|
183 |
|
184 |
### BibTeX entry and citation info
|
185 |
|
186 |
```bibtex
|
187 |
-
@article{DBLP:journals/corr/abs-
|
188 |
-
author = {
|
189 |
-
|
190 |
-
|
191 |
-
|
192 |
-
title = {
|
193 |
-
Understanding},
|
194 |
journal = {CoRR},
|
195 |
-
volume = {abs/
|
196 |
-
year = {
|
197 |
-
url = {
|
198 |
archivePrefix = {arXiv},
|
199 |
-
eprint = {
|
200 |
-
timestamp = {
|
201 |
-
biburl = {https://dblp.org/rec/journals/corr/abs-
|
202 |
bibsource = {dblp computer science bibliography, https://dblp.org}
|
203 |
}
|
204 |
-
```
|
205 |
-
|
206 |
-
<a href="https://huggingface.co/exbert/?model=bert-base-uncased">
|
207 |
-
<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
|
208 |
-
</a>
|
|
|
11 |
|
12 |
Pretrained model on English language using a masked language modeling (MLM) and next sentence prediction (NSP) objective. It was
|
13 |
introduced in [this paper](https://arxiv.org/abs/2105.03824) and first released in [this repository](https://github.com/google-research/f_net).
|
14 |
+
This model is uncased: it does not make a difference between english and English. The model achieves 0.58 accuracy on MLM objective and 0.80 on NSP objective.
|
|
|
15 |
|
16 |
+
Disclaimer: This model card has been written by [gchhablani](https://huggingface.co/gchhablani).
|
17 |
|
18 |
## Model description
|
19 |
|
|
|
79 |
|
80 |
### Limitations and bias
|
81 |
|
82 |
+
Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions. However, the model's MLM accuracy may also affect answers. Given below are some example where gender-bias could be expected:
|
|
|
83 |
|
84 |
```python
|
85 |
>>> from transformers import pipeline
|
86 |
>>> unmasker = pipeline('fill-mask', model='bert-base-uncased')
|
87 |
>>> unmasker("The man worked as a [MASK].")
|
88 |
|
89 |
+
[
|
90 |
+
{"sequence": "the man worked as a man.", "score": 0.07003913819789886, "token": 283, "token_str": "man"},
|
91 |
+
{"sequence": "the man worked as a..", "score": 0.06601415574550629, "token": 16678, "token_str": "."},
|
92 |
+
{"sequence": "the man worked as a reason.", "score": 0.020491471514105797, "token": 1612, "token_str": "reason"},
|
93 |
+
{"sequence": "the man worked as a use.", "score": 0.017683615908026695, "token": 443, "token_str": "use"},
|
94 |
+
{"sequence": "the man worked as a..", "score": 0.015186904929578304, "token": 845, "token_str": "."},
|
95 |
+
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
96 |
|
97 |
>>> unmasker("The woman worked as a [MASK].")
|
98 |
|
99 |
+
[
|
100 |
+
{"sequence": "the woman worked as a..", "score": 0.12459157407283783, "token": 16678, "token_str": "."},
|
101 |
+
{"sequence": "the woman worked as a man.", "score": 0.022601796314120293, "token": 283, "token_str": "man"},
|
102 |
+
{"sequence": "the woman worked as a..", "score": 0.0209997296333313, "token": 845, "token_str": "."},
|
103 |
+
{"sequence": "the woman worked as a woman.", "score": 0.01911095529794693, "token": 3806, "token_str": "woman"},
|
104 |
+
{"sequence": "the woman worked as a one.", "score": 0.01739976927638054, "token": 276, "token_str": "one"},
|
105 |
+
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
```
|
107 |
|
108 |
This bias will also affect all fine-tuned versions of this model.
|
109 |
|
110 |
## Training data
|
111 |
|
112 |
+
The FNet model was pretrained on [C4](https://huggingface.co/datasets/c4), a cleaned version of the Common Crawl dataset.
|
|
|
|
|
113 |
|
114 |
## Training procedure
|
115 |
|
116 |
### Preprocessing
|
117 |
|
118 |
+
The texts are lowercased and tokenized using SentencePiece and a vocabulary size of 32,000. The inputs of the model are
|
119 |
then of the form:
|
120 |
|
121 |
```
|
|
|
136 |
### Pretraining
|
137 |
|
138 |
The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size
|
139 |
+
of 256. The sequence length was limited to 512 tokens. The optimizer
|
140 |
used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
|
141 |
learning rate warmup for 10,000 steps and linear decay of the learning rate after.
|
142 |
|
|
|
148 |
|
149 |
| Task | MNLI-(m/mm) | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | Average |
|
150 |
|:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
|
151 |
+
| | 72/73 | 84 | 80 | 95 | 69 | 79 | 76 | 63| 76.7 |
|
152 |
|
153 |
|
154 |
### BibTeX entry and citation info
|
155 |
|
156 |
```bibtex
|
157 |
+
@article{DBLP:journals/corr/abs-2105-03824,
|
158 |
+
author = {James Lee{-}Thorp and
|
159 |
+
Joshua Ainslie and
|
160 |
+
Ilya Eckstein and
|
161 |
+
Santiago Onta{\~{n}}{\'{o}}n},
|
162 |
+
title = {FNet: Mixing Tokens with Fourier Transforms},
|
|
|
163 |
journal = {CoRR},
|
164 |
+
volume = {abs/2105.03824},
|
165 |
+
year = {2021},
|
166 |
+
url = {https://arxiv.org/abs/2105.03824},
|
167 |
archivePrefix = {arXiv},
|
168 |
+
eprint = {2105.03824},
|
169 |
+
timestamp = {Fri, 14 May 2021 12:13:30 +0200},
|
170 |
+
biburl = {https://dblp.org/rec/journals/corr/abs-2105-03824.bib},
|
171 |
bibsource = {dblp computer science bibliography, https://dblp.org}
|
172 |
}
|
173 |
+
```
|
|
|
|
|
|
|
|