google
/

fnet-base

@@ -11,10 +11,9 @@ datasets:
 Pretrained model on English language using a masked language modeling (MLM) and next sentence prediction (NSP) objective. It was
 introduced in [this paper](https://arxiv.org/abs/2105.03824) and first released in [this repository](https://github.com/google-research/f_net).
-This model is uncased: it does not make a difference
-between english and English.
-Disclaimer: This model card has been written by [gchhablani](https://huggingface.co/gchhablani) and tehe ori
 ## Model description
@@ -80,72 +79,43 @@ output = model(**encoded_input)
 ### Limitations and bias
-Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
-predictions:
 ```python
 >>> from transformers import pipeline
 >>> unmasker = pipeline('fill-mask', model='bert-base-uncased')
 >>> unmasker("The man worked as a [MASK].")
-[{'sequence': '[CLS] the man worked as a carpenter. [SEP]',
-  'score': 0.09747550636529922,
-  'token': 10533,
-  'token_str': 'carpenter'},
- {'sequence': '[CLS] the man worked as a waiter. [SEP]',
-  'score': 0.0523831807076931,
-  'token': 15610,
-  'token_str': 'waiter'},
- {'sequence': '[CLS] the man worked as a barber. [SEP]',
-  'score': 0.04962705448269844,
-  'token': 13362,
-  'token_str': 'barber'},
- {'sequence': '[CLS] the man worked as a mechanic. [SEP]',
-  'score': 0.03788609802722931,
-  'token': 15893,
-  'token_str': 'mechanic'},
- {'sequence': '[CLS] the man worked as a salesman. [SEP]',
-  'score': 0.037680890411138535,
-  'token': 18968,
-  'token_str': 'salesman'}]
 >>> unmasker("The woman worked as a [MASK].")
-[{'sequence': '[CLS] the woman worked as a nurse. [SEP]',
-  'score': 0.21981462836265564,
-  'token': 6821,
-  'token_str': 'nurse'},
- {'sequence': '[CLS] the woman worked as a waitress. [SEP]',
-  'score': 0.1597415804862976,
-  'token': 13877,
-  'token_str': 'waitress'},
- {'sequence': '[CLS] the woman worked as a maid. [SEP]',
-  'score': 0.1154729500412941,
-  'token': 10850,
-  'token_str': 'maid'},
- {'sequence': '[CLS] the woman worked as a prostitute. [SEP]',
-  'score': 0.037968918681144714,
-  'token': 19215,
-  'token_str': 'prostitute'},
- {'sequence': '[CLS] the woman worked as a cook. [SEP]',
-  'score': 0.03042375110089779,
-  'token': 5660,
-  'token_str': 'cook'}]
 ```
 This bias will also affect all fine-tuned versions of this model.
 ## Training data
-The BERT model was pretrained on [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038
-unpublished books and [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and
-headers).
 ## Training procedure
 ### Preprocessing
-The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are
 then of the form:
 ```
@@ -166,7 +136,7 @@ The details of the masking procedure for each sentence are the following:
 ### Pretraining
 The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size
-of 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer
 used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
 learning rate warmup for 10,000 steps and linear decay of the learning rate after.
@@ -178,31 +148,26 @@ Glue test results:
 | Task | MNLI-(m/mm) | QQP  | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE  | Average |
 |:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
-|      | 84.6/83.4   | 71.2 | 90.5 | 93.5  | 52.1 | 85.8  | 88.9 | 66.4 | 79.6    |
 ### BibTeX entry and citation info
 ```bibtex
-@article{DBLP:journals/corr/abs-1810-04805,
-  author    = {Jacob Devlin and
-               Ming{-}Wei Chang and
-               Kenton Lee and
-               Kristina Toutanova},
-  title     = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
-               Understanding},
   journal   = {CoRR},
-  volume    = {abs/1810.04805},
-  year      = {2018},
-  url       = {http://arxiv.org/abs/1810.04805},
   archivePrefix = {arXiv},
-  eprint    = {1810.04805},
-  timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
-  biburl    = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
   bibsource = {dblp computer science bibliography, https://dblp.org}
 }
-```
-<a href="https://huggingface.co/exbert/?model=bert-base-uncased">
-	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>

 Pretrained model on English language using a masked language modeling (MLM) and next sentence prediction (NSP) objective. It was
 introduced in [this paper](https://arxiv.org/abs/2105.03824) and first released in [this repository](https://github.com/google-research/f_net).
+This model is uncased: it does not make a difference between english and English. The model achieves 0.58 accuracy on MLM objective and 0.80 on NSP objective.
+Disclaimer: This model card has been written by [gchhablani](https://huggingface.co/gchhablani).
 ## Model description
 ### Limitations and bias
+Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions. However, the model's MLM accuracy may also affect answers. Given below are some example where gender-bias could be expected:
 ```python
 >>> from transformers import pipeline
 >>> unmasker = pipeline('fill-mask', model='bert-base-uncased')
 >>> unmasker("The man worked as a [MASK].")
+[
+    {"sequence": "the man worked as a man.", "score": 0.07003913819789886, "token": 283, "token_str": "man"},
+    {"sequence": "the man worked as a..", "score": 0.06601415574550629, "token": 16678, "token_str": "."},
+    {"sequence": "the man worked as a reason.", "score": 0.020491471514105797, "token": 1612, "token_str": "reason"},
+    {"sequence": "the man worked as a use.", "score": 0.017683615908026695, "token": 443, "token_str": "use"},
+    {"sequence": "the man worked as a..", "score": 0.015186904929578304, "token": 845, "token_str": "."},
+]
 >>> unmasker("The woman worked as a [MASK].")
+[
+    {"sequence": "the woman worked as a..", "score": 0.12459157407283783, "token": 16678, "token_str": "."},
+    {"sequence": "the woman worked as a man.", "score": 0.022601796314120293, "token": 283, "token_str": "man"},
+    {"sequence": "the woman worked as a..", "score": 0.0209997296333313, "token": 845, "token_str": "."},
+    {"sequence": "the woman worked as a woman.", "score": 0.01911095529794693, "token": 3806, "token_str": "woman"},
+    {"sequence": "the woman worked as a one.", "score": 0.01739976927638054, "token": 276, "token_str": "one"},
+]
 ```
 This bias will also affect all fine-tuned versions of this model.
 ## Training data
+The FNet model was pretrained on [C4](https://huggingface.co/datasets/c4), a cleaned version of the Common Crawl dataset.
 ## Training procedure
 ### Preprocessing
+The texts are lowercased and tokenized using SentencePiece and a vocabulary size of 32,000. The inputs of the model are
 then of the form:
 ```
 ### Pretraining
 The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size
+of 256. The sequence length was limited to 512 tokens. The optimizer
 used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
 learning rate warmup for 10,000 steps and linear decay of the learning rate after.
 | Task | MNLI-(m/mm) | QQP  | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE  | Average |
 |:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
+|      | 72/73   | 84 | 80 | 95  | 69 | 79  | 76 | 63| 76.7    |
 ### BibTeX entry and citation info
 ```bibtex
+@article{DBLP:journals/corr/abs-2105-03824,
+  author    = {James Lee{-}Thorp and
+               Joshua Ainslie and
+               Ilya Eckstein and
+               Santiago Onta{\~{n}}{\'{o}}n},
+  title     = {FNet: Mixing Tokens with Fourier Transforms},
   journal   = {CoRR},
+  volume    = {abs/2105.03824},
+  year      = {2021},
+  url       = {https://arxiv.org/abs/2105.03824},
   archivePrefix = {arXiv},
+  eprint    = {2105.03824},
+  timestamp = {Fri, 14 May 2021 12:13:30 +0200},
+  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-03824.bib},
   bibsource = {dblp computer science bibliography, https://dblp.org}
 }
+```