Update tokenizer_config.json
#4
by
minyichen
- opened
The current chat_template adds an extra chatML
EOS token when add_generation_prompt=False
.
Please replace it with the correct chat_template to fix this behavior.
from transformers import AutoTokenizer
message = [{"role": "user" , "content": 'How are you?'}]
tame_tokenizer = AutoTokenizer.from_pretrained("yentinglin/Llama-3-Taiwan-8B-Instruct")
tame_tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=False)
You can see an extra <|im_end|>
token in the output :
<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHow are you?<|eot_id|><|im_end|>
minyichen
changed pull request title from
Upload tokenizer_config.json
to Update tokenizer_config.json
yentinglin
changed pull request status to
merged