--- license: apache-2.0 datasets: - HuggingFaceTB/smollm-corpus language: - en pipeline_tag: text2text-generation library_name: transformers --- # tFINE-900m-e16-d32-1024ctx Pretrained T5 model with [nanoT5](https://github.com/pszemraj/nanoT5/tree/fineweb-edu-test): - ~900m parameters, 16 layers in encoder, 32 layers in decoder - sentencepiece tokenizer with 48k vocab & byte-pair fallback - handles whitespaces etc correctly (_unlike original T5 tokenizer_) - 1024 ctx during pretrain - `relative_attention_num_buckets` increased to 48 from 32 for context length upscaling ## Experiment logs Training consisted of two phases: - [phase one](https://wandb.ai/pszemraj/nanoT5/runs/l0y9uuv3) - ~30k steps at context length 512 - [phase two](https://wandb.ai/pszemraj/nanoT5/runs/mao0tqjy) - 20k steps at context length 1024