Template clarification.

by notafraud - opened 23 days ago

23 days ago

Hi! Really interesting model, but can you please clarify what you mean by "Mistral-V3 Tekken"? Is it compliant with https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md#tekken-instruct-chat-template ?

I'm asking because I don't see the difference between using whitespaces and not using them - the model always starts from either an empty string, or a whitespace, so it's closer to non-Tekken V3 template.

Nohobby

Owner 23 days ago

For some reason I thought Mistral Small was using Tekken instead of just V3.

Fixed it on the v0.2 model card. Check it out if you liked v0.1.

notafraud

23 days ago

Thank you for clarification and fixing!

I'll definitely try v0.2 out once q4_k_l or q5_k_l quants appear, they help with quality enough to be noticeable against _k_m.

notafraud changed discussion status to closed 23 days ago

notafraud

22 days ago

Fyi, suddenly more refusals on standard V3, almost no refusals on V3 Tekken. No idea why and how.

Nohobby

Owner 22 days ago

Here are 4_K_L and 5_K_L quants. Tried to make them myself this time for a change.

For the chat template — yeah, I've got no idea either lol. This is the one I use at the moment https://qu.ax/UbUc.json

notafraud

22 days ago

Many thanks! I see you use \n before [INST] - does it help in your experience? I used to do the same with 7B and Nemo, but Mistral Small Instruct didn't like it for some reason.

Nohobby

Owner 22 days ago

It's more or less the same as without the newline, although I like the output with it a bit more. Probably just a placebo thing

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment