google
/

fnet-base

@@ -85,39 +85,28 @@ FNet-base was fine-tuned and evaluated on the validation data of the [GLUE bench
 For comparison, this model (ported to PyTorch) was fine-tuned and evaluated using the [official Hugging Face GLUE evaluation scripts](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification#glue-tasks) alongside [bert-base-cased](https://hf.co/models/bert-base-cased) for comparison.
 The training was done on a single 16GB NVIDIA Tesla V100 GPU. For MRPC/WNLI, the models were trained for 5 epochs, while for other tasks, the models were trained for 3 epochs. A sequence length of 512 was used with batch size 16 and learning rate 2e-5.
-The following table summarizes the results for [fnet-base](https://huggingface.co/google/fnet-base) (called *FNet (PyTorch) - Reproduced*) and [bert-base-cased](https://hf.co/models/bert-base-cased) (called *Bert (PyTorch) - Reproduced*) both in terms of performance and training times and compares it to the reported performance of the official FNet-base model (called *FNet (Flax) - Official*).
 For more details, please refer to the checkpoints linked with the scores. On overview of all fine-tuned checkpoints of the following table can be accessed [here](https://huggingface.co/models?other=fnet-bert-base-comparison).
-| Task  | Metric                 | Result                                                        |                  |                                                                           | Training time |          |
-| ----- | ---------------------- | --------------------------------------------------------------|----------------- | ------------------------------------------------------------------------- | ------------- | -------- |
-|       |                        | Bert (PyTorch) - Reproduced                                                                            | FNet (PyTorch) - Reproduced | FNet (Flax) - Official                                                              | Bert (PyTorch) - Reproduced           | FNet (PyTorch) - Reproduced     |
-| MNLI  | Accuracy or Match/Mismatch              | [84.10](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mnli) (Accuracy)      | [76.75](https://huggingface.co/gchhablani/fnet-base-finetuned-mnli) (Accuracy)      | 72/73 (Match/Mismatch) | 09:52:33      | 06:40:55 |
-| QQP   | mean(Accuracy,F1)            | [89.26](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qqp)  | [86.5](https://huggingface.co/gchhablani/fnet-base-finetuned-qqp)  | 83 | 09:25:01      | 06:21:16 |
-| QNLI  | Accuracy               | [90.99](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli)       | [84.39](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli)       | 80  |02:40:22      | 01:48:22 |
-| SST-2 | Accuracy               | [92.32](https://huggingface.co/gchhablani/bert-base-cased-finetuned-sst2)       | [89.45](https://huggingface.co/gchhablani/fnet-base-finetuned-sst2)       | 95 | 01:42:17      | 01:09:27 |
-| CoLA  | Matthews corr or Accuracy         | [59.57](https://huggingface.co/gchhablani/bert-base-cased-finetuned-cola) (Matthews corr)     | [35.94](https://huggingface.co/gchhablani/fnet-base-finetuned-cola) (Matthews Corr)      | 69 (Accuracy) | 14:20         | 09:47    |
-| STS-B | Spearman corr. | [88.98](https://huggingface.co/gchhablani/bert-base-cased-finetuned-stsb) | [82.19](https://huggingface.co/gchhablani/fnet-base-finetuned-stsb) | 79  |10:24         | 07:09    |
-| MRPC  | mean(F1/Accuracy)           | [88.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mrpc) | [81.15](https://huggingface.co/gchhablani/fnet-base-finetuned-mrpc) |  76 |11:12         | 07:48    |
-| RTE   | Accuracy               | [67.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli)       | [62.82](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli)       |  63  |04:51         | 03:24    |
-| WNLI  | Accuracy               | [46.48](https://huggingface.co/gchhablani/bert-base-cased-finetuned-wnli)       | [54.93](https://huggingface.co/gchhablani/fnet-base-finetuned-wnli)       |  -  |03:23         | 02:37    |
-| Task  | Training time                 | | Metric         |  Result                                                        |                  |                                                                           |
-| ----- |  ---------------------- | ------------- | -------- | -------------------------------------------------------------- |----------------- | ------------------------------------------------------------------------- |
-|       | Bert (PyTorch) - Reproduced           | FNet (PyTorch) - Reproduced   |   | Bert (PyTorch) - Reproduced                                                                            | FNet (PyTorch) - Reproduced | FNet (Flax) - Official                                                              |
-| MNLI  |  09:52:33      | 06:40:55 |Accuracy or Match/Mismatch              | [84.10](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mnli) (Accuracy)      | [76.75](https://huggingface.co/gchhablani/fnet-base-finetuned-mnli) (Accuracy)      | 72/73 (Match/Mismatch) |
-| QQP   | 09:25:01      | 06:21:16 |mean(Accuracy,F1)            | [89.26](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qqp)  | [86.5](https://huggingface.co/gchhablani/fnet-base-finetuned-qqp)  | 83 |
-| QNLI  | 02:40:22      | 01:48:22 |Accuracy               | [90.99](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli)       | [84.39](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli)       | 80  |
-| SST-2 |  01:42:17      | 01:09:27 | Accuracy               | [92.32](https://huggingface.co/gchhablani/bert-base-cased-finetuned-sst2)       | [89.45](https://huggingface.co/gchhablani/fnet-base-finetuned-sst2)       | 95 |
-| CoLA  |  14:20         | 09:47    | Matthews corr or Accuracy         | [59.57](https://huggingface.co/gchhablani/bert-base-cased-finetuned-cola) (Matthews corr)     | [35.94](https://huggingface.co/gchhablani/fnet-base-finetuned-cola) (Matthews Corr)      | 69 (Accuracy) |
-| STS-B |  10:24         | 07:09    |Spearman corr. | [88.98](https://huggingface.co/gchhablani/bert-base-cased-finetuned-stsb) | [82.19](https://huggingface.co/gchhablani/fnet-base-finetuned-stsb) | 79  |
-| MRPC  | 11:12         | 07:48    | mean(F1/Accuracy)           | [88.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mrpc) | [81.15](https://huggingface.co/gchhablani/fnet-base-finetuned-mrpc) |  76 |
-| RTE   | 04:51         | 03:24    | Accuracy               | [67.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli)       | [62.82](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli)       |  63  |
-| WNLI  | 03:23         | 02:37    |Accuracy               | [46.48](https://huggingface.co/gchhablani/bert-base-cased-finetuned-wnli)       | [54.93](https://huggingface.co/gchhablani/fnet-base-finetuned-wnli)       |  -  |
-We can see that FNet-base achieves around 93% of BERT-base's performance while it requires *ca.* 30% less time to fine-tune on the downstream tasks.
 ### How to use
 You can use this model directly with a pipeline for masked language modeling:

 For comparison, this model (ported to PyTorch) was fine-tuned and evaluated using the [official Hugging Face GLUE evaluation scripts](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification#glue-tasks) alongside [bert-base-cased](https://hf.co/models/bert-base-cased) for comparison.
 The training was done on a single 16GB NVIDIA Tesla V100 GPU. For MRPC/WNLI, the models were trained for 5 epochs, while for other tasks, the models were trained for 3 epochs. A sequence length of 512 was used with batch size 16 and learning rate 2e-5.
+The following table summarizes the results for [fnet-base](https://huggingface.co/google/fnet-base) (called *FNet (PyTorch) - Reproduced*) and [bert-base-cased](https://hf.co/models/bert-base-cased) (called *Bert (PyTorch) - Reproduced*) in terms of training times. The format is *hour:min:seconds*.
+| Task | MNLI-(m/mm) | QQP  | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE  | WNLI | SUM |
+|:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:----:|:-------:|
+|FNet-base (PyTorch)| [06:40:55](https://huggingface.co/gchhablani/fnet-base-finetuned-mnli)| [06:21:16](https://huggingface.co/gchhablani/fnet-base-finetuned-qqp) | [01:48:22](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli) | [01:09:27](https://huggingface.co/gchhablani/fnet-base-finetuned-sst2) | [00:09:47](https://huggingface.co/gchhablani/fnet-base-finetuned-cola) | [00:07:09](https://huggingface.co/gchhablani/fnet-base-finetuned-stsb) | [00:07:48](https://huggingface.co/gchhablani/fnet-base-finetuned-mrpc) | [00:03:24](https://huggingface.co/gchhablani/fnet-base-finetuned-rte) | [00:02:37](https://huggingface.co/gchhablani/fnet-base-finetuned-wnli) | 16:30:45 |
+|Bert-base (PyTorch)| [09:52:33](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mnli)| [09:25:01](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qqp) | [02:40:22](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli) | [01:42:17](https://huggingface.co/gchhablani/bert-base-cased-finetuned-sst2) | [00:14:20](https://huggingface.co/gchhablani/bert-base-cased-finetuned-cola) | [00:10:24](https://huggingface.co/gchhablani/bert-base-cased-finetuned-stsb) | [00:11:12](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mrpc) | [00:04:51](https://huggingface.co/gchhablani/bert-base-cased-finetuned-rte) | [00:03:23](https://huggingface.co/gchhablani/bert-base-cased-finetuned-wnli) | 24:23:56 |
+On average the PyTorch version of FNet-base requires *ca.* 30% less time for GLUE fine-tuning on GPU.
+The following table summarizes the results for [fnet-base](https://huggingface.co/google/fnet-base) (called *FNet (PyTorch) - Reproduced*) and [bert-base-cased](https://hf.co/models/bert-base-cased) (called *Bert (PyTorch) - Reproduced*) in terms of performance and compares it to the reported performance of the official FNet-base model (called *FNet (Flax) - Official*).
+| Task | MNLI-(m/mm) | QQP  | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE  | WNLI | Avg |
+|:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:----:|:-------:|
+| Metric | Accuracy or Match/Mismatch | mean(Accuracy,F1) | Accuracy | Accuracy | Matthews corr or Accuracy | Spearman corr. |  mean(F1/Accuracy) | Accuracy | Accuracy | - |
+|FNet-base (PyTorch)| [76.75](https://huggingface.co/gchhablani/fnet-base-finetuned-mnli)| [86.5](https://huggingface.co/gchhablani/fnet-base-finetuned-qqp) | [84.39](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli) | [89.45](https://huggingface.co/gchhablani/fnet-base-finetuned-sst2) | [35.94](https://huggingface.co/gchhablani/fnet-base-finetuned-cola) | [82.19](https://huggingface.co/gchhablani/fnet-base-finetuned-stsb) | [81.15](https://huggingface.co/gchhablani/fnet-base-finetuned-mrpc) | [62.82](https://huggingface.co/gchhablani/fnet-base-finetuned-rte) | [54.93](https://huggingface.co/gchhablani/fnet-base-finetuned-wnli) | - |
+|Bert-base (PyTorch)| [84.10](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mnli)| [89.26](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qqp) | [90.99](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli) | [92.32](https://huggingface.co/gchhablani/bert-base-cased-finetuned-sst2) | [59.57](https://huggingface.co/gchhablani/bert-base-cased-finetuned-cola) | [88.98](https://huggingface.co/gchhablani/bert-base-cased-finetuned-stsb) | [88.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mrpc) | [67.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-rte) | [46.48](https://huggingface.co/gchhablani/bert-base-cased-finetuned-wnli) | - |
+| FNet-Base (Flax - official) | 72/73 | 83 | 80 | 95 | 69 | 79 | 76 | 63 | - | 76.7 |
+We can see that FNet-base achieves around 93% of BERT-base's performance on average.
 For more details, please refer to the checkpoints linked with the scores. On overview of all fine-tuned checkpoints of the following table can be accessed [here](https://huggingface.co/models?other=fnet-bert-base-comparison).
 ### How to use
 You can use this model directly with a pipeline for masked language modeling: