---
license: mit
language:
- en
tags:
- text generation
datasets:
- fhswf/TinyStoriesV2_cleaned
---
BPE Tokenizer for TinyStoriesV2
---
Based on get-neo BPE Tokenizer, but with a smaller vocabulary. 
Trained with TinyStoriesV2.

- Vocab Size: 4096 
- 256 Base chars
- 1 extra Token: <|endoftext|>
- 3839 merges