English
text generation
Edit model card

BPE Tokenizer for TinyStoriesV2

Based on get-neo BPE Tokenizer, but with a smaller vocabulary. Trained with TinyStoriesV2.

  • Vocab Size: 4096
  • 256 Base chars
  • 1 extra Token: <|endoftext|>
  • 3839 merges
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train fhswf/BPE_GPT2_TinyStoriesV2_cleaned_4096