Update chat_template on tokenizer_config

#14

This PR corrects the chat_template in the tokenizer config. The issue was that the system role was not included correctly. It should be appended before the first user message, not before the last one.

The chat template is copied from https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/commit/43ee8f4afb6fc9e4304a8ed87aaa3a36a0e06939. @Rocketknight1 can you review this and see if the update is valid?

Mistral AI_ org

Only the v1 of the tokenizers appends to the first message, as you can see here: https://github.com/mistralai/mistral-common/blob/main/src/mistral_common/tokens/tokenizers/sentencepiece.py
v2 and v3 both append to the last message, so the current one seems correct. Here is the output of mistral_common for v3 with tekken:

<s>[INST]User[/INST]Assistant</s>[INST]System

User[/INST]

But how can we create a simple prompt with a system/user/assistant structure? In this case, the system message would be skipped and only appear if the last role is a user which doesn't make sense in most use cases.

is it solved ?

I think there's nothing to solve here - the chat template on Hugging Face matches the output from mistral_common! If anyone has an example conversation where mistral_common yields a different output than Hugging Face's tokenizer.apply_chat_template(), please reopen this issue and ping me. Until then, closing it.

Rocketknight1 changed pull request status to closed

Sign up or log in to comment