cookiemonster
commited on
Commit
•
355e704
1
Parent(s):
8794c71
update base models description for more clarity
Browse fileshttps://discuss.huggingface.co/t/difference-in-dimensions-of-t0-vs-t5-models/26243
README.md
CHANGED
@@ -111,7 +111,7 @@ If you want to use another checkpoint, please replace the path in `AutoTokenizer
|
|
111 |
|
112 |
# Training procedure
|
113 |
|
114 |
-
T0* models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4). We use the publicly available [language model-adapted T5 checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) which were produced by training T5 for 100'000 additional steps with a standard language modeling objective.
|
115 |
|
116 |
At a high level, the input text is fed to the encoder and the target text is produced by the decoder. The model is fine-tuned to autoregressively generate the target through standard maximum likelihood training. It is never trained to generate the input. We detail our training data in the next section.
|
117 |
|
|
|
111 |
|
112 |
# Training procedure
|
113 |
|
114 |
+
T0* models are based on [T5 v1.1](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4). We use the publicly available [language model-adapted T5 checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) which were produced by training T5 for 100'000 additional steps with a standard language modeling objective ([t5-*-lm-adapt](https://huggingface.co/google/t5-large-lm-adapt)).
|
115 |
|
116 |
At a high level, the input text is fed to the encoder and the target text is produced by the decoder. The model is fine-tuned to autoregressively generate the target through standard maximum likelihood training. It is never trained to generate the input. We detail our training data in the next section.
|
117 |
|