pszemraj commited on
Commit
bbbb8d2
1 Parent(s): c881248

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -12,13 +12,13 @@ library_name: transformers
12
  # tFINE-850m-24x24-1024ctx
13
 
14
 
15
- Pretrained T5 model with nanoT5:
16
 
17
  - ~850m parameters, 24 layers in encoder, 24 layers in decoder
18
  - sentencepiece tokenizer with 48k vocab & byte-pair fallback
19
- - handles whitespaces etc correctly (unlike standard T5 tokenizer)
20
  - 1024 ctx during pretrain
21
- - `relative_attention_num_buckets` increased to 48 from standard 32 for context length upscaling
22
 
23
  ## Experiment logs
24
 
 
12
  # tFINE-850m-24x24-1024ctx
13
 
14
 
15
+ Pretrained T5 model with [nanoT5](https://github.com/pszemraj/nanoT5/tree/fineweb-edu-test):
16
 
17
  - ~850m parameters, 24 layers in encoder, 24 layers in decoder
18
  - sentencepiece tokenizer with 48k vocab & byte-pair fallback
19
+ - handles whitespaces etc correctly (_unlike original T5 tokenizer_)
20
  - 1024 ctx during pretrain
21
+ - `relative_attention_num_buckets` increased to 48 from 32 for context length upscaling
22
 
23
  ## Experiment logs
24