|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- HuggingFaceTB/smollm-corpus |
|
language: |
|
- en |
|
pipeline_tag: text2text-generation |
|
library_name: transformers |
|
--- |
|
|
|
|
|
# tFINE-850m-24x24-1024ctx |
|
|
|
|
|
Pretrained T5 model with [nanoT5](https://github.com/pszemraj/nanoT5/tree/fineweb-edu-test): |
|
|
|
- ~850m parameters, 24 layers in encoder, 24 layers in decoder |
|
- sentencepiece tokenizer with 48k vocab & byte-pair fallback |
|
- handles whitespaces etc correctly (_unlike original T5 tokenizer_) |
|
- 1024 ctx during pretrain |
|
- `relative_attention_num_buckets` increased to 48 from 32 for context length upscaling |
|
|
|
## Experiment logs |
|
|
|
Training consisted of two phases: |
|
|
|
- TODO |
|
- TODO |