Doge-tokenizer / README.md
JingzeShi's picture
Update README.md (#1)
f88466f verified
metadata
library_name: transformers
datasets:
  - HuggingFaceTB/smollm-corpus

Doge-tokenizer

Tokenizer for the training model on smollm-corpus. This tokenizer was trained on 2M samples from:

  • FineWeb-Edu 70%
  • Cosmopedia v2 20%
  • Python-Edu 5%
  • FineMath 5%