Template clarification.

#1
by notafraud - opened

Hi! Really interesting model, but can you please clarify what you mean by "Mistral-V3 Tekken"? Is it compliant with https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md#tekken-instruct-chat-template ?

I'm asking because I don't see the difference between using whitespaces and not using them - the model always starts from either an empty string, or a whitespace, so it's closer to non-Tekken V3 template.

For some reason I thought Mistral Small was using Tekken instead of just V3.

Fixed it on the v0.2 model card. Check it out if you liked v0.1.

Thank you for clarification and fixing!

I'll definitely try v0.2 out once q4_k_l or q5_k_l quants appear, they help with quality enough to be noticeable against _k_m.

notafraud changed discussion status to closed

Fyi, suddenly more refusals on standard V3, almost no refusals on V3 Tekken. No idea why and how.

Here are 4_K_L and 5_K_L quants. Tried to make them myself this time for a change.

For the chat template — yeah, I've got no idea either lol. This is the one I use at the moment https://qu.ax/UbUc.json

Many thanks! I see you use \n before [INST] - does it help in your experience? I used to do the same with 7B and Nemo, but Mistral Small Instruct didn't like it for some reason.

It's more or less the same as without the newline, although I like the output with it a bit more. Probably just a placebo thing

Sign up or log in to comment