Template clarification.
Hi! Really interesting model, but can you please clarify what you mean by "Mistral-V3 Tekken"? Is it compliant with https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md#tekken-instruct-chat-template ?
I'm asking because I don't see the difference between using whitespaces and not using them - the model always starts from either an empty string, or a whitespace, so it's closer to non-Tekken V3 template.
Thank you for clarification and fixing!
I'll definitely try v0.2 out once q4_k_l
or q5_k_l
quants appear, they help with quality enough to be noticeable against _k_m
.
Fyi, suddenly more refusals on standard V3, almost no refusals on V3 Tekken. No idea why and how.
Here are 4_K_L and 5_K_L quants. Tried to make them myself this time for a change.
For the chat template — yeah, I've got no idea either lol. This is the one I use at the moment https://qu.ax/UbUc.json
Many thanks! I see you use \n
before [INST]
- does it help in your experience? I used to do the same with 7B and Nemo, but Mistral Small Instruct didn't like it for some reason.
It's more or less the same as without the newline, although I like the output with it a bit more. Probably just a placebo thing