`tokenizer_config.json` has a duplicate entry for `clean_up_tokenization_spaces`
#17
by
polarathene
- opened
tokenizer_config.json
has a duplicate entry for clean_up_tokenization_spaces
, the first occurrence at the end of the chat_template
line:
"chat_template": "{{bos_token}}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}", "clean_up_tokenization_spaces": true,
"clean_up_tokenization_spaces": false,
- 1st occurrence is
true
- 2nd occurrence is
false
I'm not sure which is the intended value here, however mistral.rs
will refuse to load the model due to the duplicate clean_up_tokenization_spaces
key. Other software that accepts it presumably uses either the 1st ignoring the 2nd, or treats the 2nd as an override.
Could you please correct this?
thank you for pointing it out
the second occurrence of clean_up_tokenization_spaces
has been removed
https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B/commit/c9005c2d51dc3e0ff3399c59951b2353767d1d15
Thanks for getting that sorted! β€οΈ
polarathene
changed discussion status to
closed