yhavinga commited on
Commit
f2582be
1 Parent(s): 5c4ca05

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -16
README.md CHANGED
@@ -20,7 +20,7 @@ Hugging Face Spaces for the **[Netherformer 📰](https://huggingface.co/spaces/
20
 
21
  ## Tokenizer
22
 
23
- * Tokenizer trained from scratch for Dutch on mC4 nl cleaned with scripts from the Huggingface
24
  Transformers [Flax examples](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling).
25
 
26
  ## Dataset
@@ -38,9 +38,11 @@ which is the original mC4, except
38
 
39
  ## Models
40
 
41
- * The first model, `t5-base-dutch` is a re-training of the Dutch T5 base v1.0 model trained during the Flax/Jax community
42
- week. With training complete, accuracy was improved from 0,64 to 0,70.
43
- * The second two models are a uncased and cased version of `t5-v1.1-base`, again pre-trained from scratch on Dutch,
 
 
44
  with a tokenizer also trained from scratch. The t5 v1.1 models are slightly different from the t5 models, and the
45
  base models are trained with a dropout of 0.0. For fine-tuning it is intended to set this back to 0.1.
46
  * The large cased model is a pre-trained Dutch version of `t5-v1.1-large`. Training of t5-v1.1-large proved difficult.
@@ -49,19 +51,19 @@ which is the original mC4, except
49
  The latest checkpoint, training scripts and metrics are available for reference. For actual fine-tuning the cased
50
  base model is probably the better choice.
51
 
52
- | | model | train seq len | acc | loss | batch size | epochs | steps | dropout | optim | lr | duration |
53
- |----------------------------|---------|---------------|----------|----------|------------|--------|---------|---------|-----------|------|----------|
54
- | t5-base-dutch | T5 | 512 | 0,70 | 1,38 | 128 | 1 | 528481 | 0.1 | adafactor | 5e-3 | 2d 9h |
55
- | t5-v1.1-base-dutch-uncased | t5-v1.1 | 1024 | 0,73 | 1,20 | 64 | 2 | 1014525 | 0.0 | adafactor | 5e-3 | 5d 5h |
56
- | t5-v1.1-base-dutch-cased | t5-v1.1 | 1024 | **0,78** | **0,96** | 64 | 2 | 1210000 | 0.0 | adafactor | 5e-3 | 6d 6h |
57
- | t5-v1.1-large-dutch-cased | t5-v1.1 | 512 | 0,76 | 1,07 | 64 | 1 | 1120000 | 0.1 | adafactor | 5e-3 | 86 13h |
58
 
59
  The cased t5-v1.1 Dutch models were fine-tuned on summarizing the CNN Daily Mail dataset.
60
 
61
- | | model | input len | target len | Rouge1 | Rouge2 | RougeL | RougeLsum | Test Gen Len | epochs | batch size | steps | duration |
62
- |------------------------------|---------|-----------|------------|--------|--------|--------|-----------|--------------|--------|------------|-------|----------|
63
- | t5-v1.1-base-dutch-cnn-test | t5-v1.1 | 1024 | 96 | 34,8 | 13,6 | 25,2 | 32,1 | 79 | 6 | 64 | 26916 | 2h 40m |
64
- | t5-v1.1-large-dutch-cnn-test | t5-v1.1 | 1024 | 96 | 34,4 | 13,6 | 25,3 | 31,7 | 81 | 5 | 16 | 89720 | 11h |
65
 
66
 
67
  ## Acknowledgements
@@ -69,9 +71,10 @@ The cased t5-v1.1 Dutch models were fine-tuned on summarizing the CNN Daily Mail
69
  This project would not have been possible without compute generously provided by Google through the
70
  [TPU Research Cloud](https://sites.research.google/trc/). The HuggingFace 🤗 ecosystem was also
71
  instrumental in many, if not all parts of the training. The following repositories where helpful in setting up the TPU-VM,
72
- and getting an idea what sensible hyper-parameters are for training gpt2 from scratch.
73
 
74
  * [Gsarti's Pretrain and Fine-tune a T5 model with Flax on GCP](https://github.com/gsarti/t5-flax-gcp)
 
75
  * [Flax/Jax Community week t5-base-dutch](https://huggingface.co/flax-community/t5-base-dutch)
76
 
77
- Created by [Yeb Havinga](https://www.linkedin.com/in/yeb-havinga-86530825/)
 
20
 
21
  ## Tokenizer
22
 
23
+ * SentencePiece tokenizer trained from scratch for Dutch on mC4 nl cleaned with scripts from the Huggingface
24
  Transformers [Flax examples](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling).
25
 
26
  ## Dataset
 
38
 
39
  ## Models
40
 
41
+ TL;DR: [yhavinga/t5-v1.1-base-dutch-cased](https://huggingface.co/yhavinga/t5-v1.1-base-dutch-cased) is the best model.
42
+
43
+ * `yhavinga/t5-base-dutch` is a re-training of the Dutch T5 base v1.0 model trained during the summer 2021
44
+ Flax/Jax community week. Accuracy was improved from 0.64 to 0.70.
45
+ * The two T5 v1.1 base models are an uncased and cased version of `t5-v1.1-base`, again pre-trained from scratch on Dutch,
46
  with a tokenizer also trained from scratch. The t5 v1.1 models are slightly different from the t5 models, and the
47
  base models are trained with a dropout of 0.0. For fine-tuning it is intended to set this back to 0.1.
48
  * The large cased model is a pre-trained Dutch version of `t5-v1.1-large`. Training of t5-v1.1-large proved difficult.
 
51
  The latest checkpoint, training scripts and metrics are available for reference. For actual fine-tuning the cased
52
  base model is probably the better choice.
53
 
54
+ | | model | train seq len | acc | loss | batch size | epochs | steps | dropout | optim | lr | duration |
55
+ |---------------------------------------------------------------------------------------------------|---------|---------------|----------|----------|------------|--------|---------|---------|-----------|------|----------|
56
+ | [yhavinga/t5-base-dutch](https://huggingface.co/yhavinga/t5-base-dutch) | T5 | 512 | 0,70 | 1,38 | 128 | 1 | 528481 | 0.1 | adafactor | 5e-3 | 2d 9h |
57
+ | [yhavinga/t5-v1.1-base-dutch-uncased](https://huggingface.co/yhavinga/t5-v1.1-base-dutch-uncased) | t5-v1.1 | 1024 | 0,73 | 1,20 | 64 | 2 | 1014525 | 0.0 | adafactor | 5e-3 | 5d 5h |
58
+ | [yhavinga/t5-v1.1-base-dutch-cased](https://huggingface.co/yhavinga/t5-v1.1-base-dutch-cased) | t5-v1.1 | 1024 | **0,78** | **0,96** | 64 | 2 | 1210000 | 0.0 | adafactor | 5e-3 | 6d 6h |
59
+ | [yhavinga/t5-v1.1-large-dutch-cased](https://huggingface.co/yhavinga/t5-v1.1-large-dutch-cased) | t5-v1.1 | 512 | 0,76 | 1,07 | 64 | 1 | 1120000 | 0.1 | adafactor | 5e-3 | 86 13h |
60
 
61
  The cased t5-v1.1 Dutch models were fine-tuned on summarizing the CNN Daily Mail dataset.
62
 
63
+ | | model | input len | target len | Rouge1 | Rouge2 | RougeL | RougeLsum | Test Gen Len | epochs | batch size | steps | duration |
64
+ |-------------------------------------------------------------------------------------------------------|---------|-----------|------------|--------|--------|--------|-----------|--------------|--------|------------|-------|----------|
65
+ | [yhavinga/t5-v1.1-base-dutch-cnn-test](https://huggingface.co/yhavinga/t5-v1.1-base-dutch-cnn-test) | t5-v1.1 | 1024 | 96 | 34,8 | 13,6 | 25,2 | 32,1 | 79 | 6 | 64 | 26916 | 2h 40m |
66
+ | [yhavinga/t5-v1.1-large-dutch-cnn-test](https://huggingface.co/yhavinga/t5-v1.1-large-dutch-cnn-test) | t5-v1.1 | 1024 | 96 | 34,4 | 13,6 | 25,3 | 31,7 | 81 | 5 | 16 | 89720 | 11h |
67
 
68
 
69
  ## Acknowledgements
 
71
  This project would not have been possible without compute generously provided by Google through the
72
  [TPU Research Cloud](https://sites.research.google/trc/). The HuggingFace 🤗 ecosystem was also
73
  instrumental in many, if not all parts of the training. The following repositories where helpful in setting up the TPU-VM,
74
+ and training the models:
75
 
76
  * [Gsarti's Pretrain and Fine-tune a T5 model with Flax on GCP](https://github.com/gsarti/t5-flax-gcp)
77
+ * [HUggingFace Flax MLM examples](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling)
78
  * [Flax/Jax Community week t5-base-dutch](https://huggingface.co/flax-community/t5-base-dutch)
79
 
80
+ Created by [Yeb Havinga](https://www.linkedin.com/in/yeb-havinga-86530825/)