Token Configuration Correction
Good afternoon!
When deploying the this model locally, I discovered what I believe to be a bug in the configuration of tokenizer_config.json
and config.json
.
tokenizer_config.json
: The "bos_token"
is set to null
when I believe it should be set to "<|im_start|>"
.
config.json
: The "bos_token_id"
is not set, but I believe it should be set to 151643
for the token "<|im_start|>"
.
I've included the proposed changes in this issue. This issue also affects cognitivecomputations/dolphin-2.9.2-qwen2-7b
. I'll submit another PR there if we are in agreement here.
Thank you for you contributions; I love these finetunes.
Warm regards,
Ben
cognitive contributions. Thanks for the PR - I added special tokens when training the model when I didn't need to. Old habits die hard.