cookiemonster commited on
Commit
355e704
1 Parent(s): 8794c71

update base models description for more clarity

Browse files

https://discuss.huggingface.co/t/difference-in-dimensions-of-t0-vs-t5-models/26243

Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -111,7 +111,7 @@ If you want to use another checkpoint, please replace the path in `AutoTokenizer
111
 
112
  # Training procedure
113
 
114
- T0* models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4). We use the publicly available [language model-adapted T5 checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) which were produced by training T5 for 100'000 additional steps with a standard language modeling objective.
115
 
116
  At a high level, the input text is fed to the encoder and the target text is produced by the decoder. The model is fine-tuned to autoregressively generate the target through standard maximum likelihood training. It is never trained to generate the input. We detail our training data in the next section.
117
 
 
111
 
112
  # Training procedure
113
 
114
+ T0* models are based on [T5 v1.1](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4). We use the publicly available [language model-adapted T5 checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) which were produced by training T5 for 100'000 additional steps with a standard language modeling objective ([t5-*-lm-adapt](https://huggingface.co/google/t5-large-lm-adapt)).
115
 
116
  At a high level, the input text is fed to the encoder and the target text is produced by the decoder. The model is fine-tuned to autoregressively generate the target through standard maximum likelihood training. It is never trained to generate the input. We detail our training data in the next section.
117