yhavinga
/

t5-v1.1-base-dutch-cnn-test

@@ -20,7 +20,7 @@ Hugging Face Spaces for the **[Netherformer 📰](https://huggingface.co/spaces/
 ## Tokenizer
-* Tokenizer trained from scratch for Dutch on mC4 nl cleaned with scripts from the Huggingface
   Transformers [Flax examples](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling).
 ## Dataset
@@ -38,9 +38,11 @@ which is the original mC4, except
 ## Models
-* The first model, `t5-base-dutch` is a re-training of the Dutch T5 base v1.0 model trained during the Flax/Jax community
-  week. With training complete, accuracy was improved from 0,64 to 0,70.
-* The second two models are a uncased and cased version of `t5-v1.1-base`, again pre-trained from scratch on Dutch,
   with a tokenizer also trained from scratch. The t5 v1.1 models are slightly different from the t5 models, and the
   base models are trained with a dropout of 0.0. For fine-tuning it is intended to set this back to 0.1.
 * The large cased model is a pre-trained Dutch version of `t5-v1.1-large`. Training of t5-v1.1-large proved difficult.
@@ -49,19 +51,19 @@ which is the original mC4, except
   The latest checkpoint, training scripts and metrics are available for reference. For actual fine-tuning the cased
   base model is probably the better choice.
-|                            | model   | train seq len | acc      | loss     | batch size | epochs | steps   | dropout | optim     | lr   | duration |
-|----------------------------|---------|---------------|----------|----------|------------|--------|---------|---------|-----------|------|----------|
-| t5-base-dutch              | T5      | 512           | 0,70     | 1,38     | 128        | 1      | 528481  | 0.1     | adafactor | 5e-3 | 2d 9h    |
-| t5-v1.1-base-dutch-uncased | t5-v1.1 | 1024          | 0,73     | 1,20     | 64         | 2      | 1014525 | 0.0     | adafactor | 5e-3 | 5d 5h    |
-| t5-v1.1-base-dutch-cased   | t5-v1.1 | 1024          | **0,78** | **0,96** | 64         | 2      | 1210000 | 0.0     | adafactor | 5e-3 | 6d 6h    |
-| t5-v1.1-large-dutch-cased  | t5-v1.1 | 512           | 0,76     | 1,07     | 64         | 1      | 1120000 | 0.1     | adafactor | 5e-3 | 86 13h   |
 The cased t5-v1.1 Dutch models were fine-tuned on summarizing the CNN Daily Mail dataset.
-|                              | model   | input len | target len | Rouge1 | Rouge2 | RougeL | RougeLsum | Test Gen Len | epochs | batch size | steps | duration |
-|------------------------------|---------|-----------|------------|--------|--------|--------|-----------|--------------|--------|------------|-------|----------|
-| t5-v1.1-base-dutch-cnn-test  | t5-v1.1 | 1024      | 96         | 34,8   | 13,6   | 25,2   | 32,1      | 79           | 6      | 64         | 26916 | 2h 40m   |
-| t5-v1.1-large-dutch-cnn-test | t5-v1.1 | 1024      | 96         | 34,4   | 13,6   | 25,3   | 31,7      | 81           | 5      | 16         | 89720 | 11h      |
 ## Acknowledgements
@@ -69,9 +71,10 @@ The cased t5-v1.1 Dutch models were fine-tuned on summarizing the CNN Daily Mail
 This project would not have been possible without compute generously provided by Google through the
 [TPU Research Cloud](https://sites.research.google/trc/). The HuggingFace 🤗 ecosystem was also
 instrumental in many, if not all parts of the training. The following repositories where helpful in setting up the TPU-VM,
-and getting an idea what sensible hyper-parameters are for training gpt2 from scratch.
 * [Gsarti's Pretrain and Fine-tune a T5 model with Flax on GCP](https://github.com/gsarti/t5-flax-gcp)
 * [Flax/Jax Community week t5-base-dutch](https://huggingface.co/flax-community/t5-base-dutch)
-Created by [Yeb Havinga](https://www.linkedin.com/in/yeb-havinga-86530825/)

 ## Tokenizer
+* SentencePiece tokenizer trained from scratch for Dutch on mC4 nl cleaned with scripts from the Huggingface
   Transformers [Flax examples](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling).
 ## Dataset
 ## Models
+TL;DR: [yhavinga/t5-v1.1-base-dutch-cased](https://huggingface.co/yhavinga/t5-v1.1-base-dutch-cased) is the best model.
+* `yhavinga/t5-base-dutch` is a re-training of the Dutch T5 base v1.0 model trained during the summer 2021
+  Flax/Jax community week. Accuracy was improved from 0.64 to 0.70.
+* The two T5 v1.1 base models are an uncased and cased version of `t5-v1.1-base`, again pre-trained from scratch on Dutch,
   with a tokenizer also trained from scratch. The t5 v1.1 models are slightly different from the t5 models, and the
   base models are trained with a dropout of 0.0. For fine-tuning it is intended to set this back to 0.1.
 * The large cased model is a pre-trained Dutch version of `t5-v1.1-large`. Training of t5-v1.1-large proved difficult.
   The latest checkpoint, training scripts and metrics are available for reference. For actual fine-tuning the cased
   base model is probably the better choice.
+|                                                                                                   | model   | train seq len | acc      | loss     | batch size | epochs | steps   | dropout | optim     | lr   | duration |
+|---------------------------------------------------------------------------------------------------|---------|---------------|----------|----------|------------|--------|---------|---------|-----------|------|----------|
+| [yhavinga/t5-base-dutch](https://huggingface.co/yhavinga/t5-base-dutch)                           | T5      | 512           | 0,70     | 1,38     | 128        | 1      | 528481  | 0.1     | adafactor | 5e-3 | 2d 9h    |
+| [yhavinga/t5-v1.1-base-dutch-uncased](https://huggingface.co/yhavinga/t5-v1.1-base-dutch-uncased) | t5-v1.1 | 1024          | 0,73     | 1,20     | 64         | 2      | 1014525 | 0.0     | adafactor | 5e-3 | 5d 5h    |
+| [yhavinga/t5-v1.1-base-dutch-cased](https://huggingface.co/yhavinga/t5-v1.1-base-dutch-cased)     | t5-v1.1 | 1024          | **0,78** | **0,96** | 64         | 2      | 1210000 | 0.0     | adafactor | 5e-3 | 6d 6h    |
+| [yhavinga/t5-v1.1-large-dutch-cased](https://huggingface.co/yhavinga/t5-v1.1-large-dutch-cased)   | t5-v1.1 | 512           | 0,76     | 1,07     | 64         | 1      | 1120000 | 0.1     | adafactor | 5e-3 | 86 13h   |
 The cased t5-v1.1 Dutch models were fine-tuned on summarizing the CNN Daily Mail dataset.
+|                                                                                                       | model   | input len | target len | Rouge1 | Rouge2 | RougeL | RougeLsum | Test Gen Len | epochs | batch size | steps | duration |
+|-------------------------------------------------------------------------------------------------------|---------|-----------|------------|--------|--------|--------|-----------|--------------|--------|------------|-------|----------|
+| [yhavinga/t5-v1.1-base-dutch-cnn-test](https://huggingface.co/yhavinga/t5-v1.1-base-dutch-cnn-test)   | t5-v1.1 | 1024      | 96         | 34,8   | 13,6   | 25,2   | 32,1      | 79           | 6      | 64         | 26916 | 2h 40m   |
+| [yhavinga/t5-v1.1-large-dutch-cnn-test](https://huggingface.co/yhavinga/t5-v1.1-large-dutch-cnn-test) | t5-v1.1 | 1024      | 96         | 34,4   | 13,6   | 25,3   | 31,7      | 81           | 5      | 16         | 89720 | 11h      |
 ## Acknowledgements
 This project would not have been possible without compute generously provided by Google through the
 [TPU Research Cloud](https://sites.research.google/trc/). The HuggingFace 🤗 ecosystem was also
 instrumental in many, if not all parts of the training. The following repositories where helpful in setting up the TPU-VM,
+and training the models:
 * [Gsarti's Pretrain and Fine-tune a T5 model with Flax on GCP](https://github.com/gsarti/t5-flax-gcp)
+* [HUggingFace Flax MLM examples](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling)
 * [Flax/Jax Community week t5-base-dutch](https://huggingface.co/flax-community/t5-base-dutch)
+Created by [Yeb Havinga](https://www.linkedin.com/in/yeb-havinga-86530825/)