BEE-spoke-data
/

slimpajama_tok-48128-BPE-forT5

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Aug 7, 2024

Commit

daab53c

•

1 Parent(s): b41b3ca

Update README.md

Files changed (1) hide show

README.md +8 -5

README.md CHANGED Viewed

@@ -1,17 +1,20 @@
 ---
 library_name: transformers
 license: mit
 ---
-# Model Card for Model ID
-adapted for t5
 Tokens:
 	`['▁In', '▁', '2', '0', '2', '3', ',', '▁Dr', '.', '▁Jane', '▁Smith', '-', 'John', 'son', '▁published', '▁groundbreaking', '▁research', '▁on', '▁quantum', '▁ent', 'ang', 'lement', ',', '▁demonstrating', '▁a', '▁', '9', '9', '.', '9', '%', '▁success', '▁rate', '▁in', '▁tele', 'port', 'ing', '▁qu', 'bits', '▁over', '▁', '1', '0', '0', 'km', '▁using', '▁her', '▁patented', "▁'", 'Q', '-', 'Link', "'", '▁technology', '.', '</s>']`
-- Compression ratio: 3.54
-- Vocabulary size: 48228
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/KL4UbQpJESQgnAf3FTtiS.png)

 ---
 library_name: transformers
 license: mit
+language:
+- en
 ---
+# 48k vocab LlamaTokenizer for T5
+custom tokenizer from [scaling study](https://huggingface.co/sail/scaling-with-vocab-trained-tokenizers) adapted for T5 training
+- Compression ratio: 3.54
+- Vocabulary size: 48228
 Tokens:
 	`['▁In', '▁', '2', '0', '2', '3', ',', '▁Dr', '.', '▁Jane', '▁Smith', '-', 'John', 'son', '▁published', '▁groundbreaking', '▁research', '▁on', '▁quantum', '▁ent', 'ang', 'lement', ',', '▁demonstrating', '▁a', '▁', '9', '9', '.', '9', '%', '▁success', '▁rate', '▁in', '▁tele', 'port', 'ing', '▁qu', 'bits', '▁over', '▁', '1', '0', '0', 'km', '▁using', '▁her', '▁patented', "▁'", 'Q', '-', 'Link', "'", '▁technology', '.', '</s>']`
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/KL4UbQpJESQgnAf3FTtiS.png)