tFINE-900m-e16-d32-1024ctx

Pretrained T5 model with nanoT5:

~900m parameters, 16 layers in encoder, 32 layers in decoder
sentencepiece tokenizer with 48k vocab & byte-pair fallback
- handles whitespaces etc correctly (unlike original T5 tokenizer)
1024 ctx during pretrain
relative_attention_num_buckets increased to 48 from 32 for context length upscaling

Experiment logs

Training consisted of two phases:

phase one - ~30k steps at context length 512
phase two - 20k steps at context length 1024

Downloads last month: 53

Safetensors

Model size

887M params

Tensor type

F32

Inference Examples

Text2Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/tFINE-900m-e16-d32-1024ctx

Finetunes

1 model

Dataset used to train pszemraj/tFINE-900m-e16-d32-1024ctx

Collection including pszemraj/tFINE-900m-e16-d32-1024ctx

tFINE

Collection

pretrained t5 models on high quality data(e.g. fineweb) • 5 items • Updated Oct 30