Upload tokenizer.json

#2

The persisted tokenizer.json does not have the template processor for adding special tokens. transformers overrides the processor on load, but when loading tokenizer.json directly with the Rust tokenizers it's nice to have the processor there already (which worked so far in case of other models). This basically re-saves the tokenizer to match exactly what is loaded by transformers.


Generated with:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-1.3b-base")
assert tokenizer.is_fast
tokenizer.save_pretrained("...")
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment