Pixtral-Large-Instruct-2411 🧡 ExLlamaV2 3.0bpw Quant

Vision inputs working on dev branch of ExLlamaV2.

Tokenizer And Prompt Template

Using conversion of v7m1 tokenizer with 32k vocab size.

Chat template in chat_template.json uses the v7 instruct template:

<s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT][INST] <user message>[/INST] <assistant response></s>[INST] <user message>[/INST]

Repo	Bits	Head Bits	Size
nintwentydo/Pixtral-Large-Instruct-2411-exl2-2.0bpw	2.0	6.0	35.18 GB
nintwentydo/Pixtral-Large-Instruct-2411-exl2-2.5bpw	2.5	6.0	39.34 GB
nintwentydo/Pixtral-Large-Instruct-2411-exl2-3.0bpw	3.0	6.0	46.42 GB
nintwentydo/Pixtral-Large-Instruct-2411-exl2-3.5bpw	3.5	6.0	53.50 GB
nintwentydo/Pixtral-Large-Instruct-2411-exl2-4.0bpw	4.0	6.0	60.61 GB
nintwentydo/Pixtral-Large-Instruct-2411-exl2-4.5bpw	4.5	6.0	67.68 GB
nintwentydo/Pixtral-Large-Instruct-2411-exl2-5.0bpw	5.0	6.0	74.76 GB
nintwentydo/Pixtral-Large-Instruct-2411-exl2-6.0bpw	6.0	8.0	88.81 GB
nintwentydo/Pixtral-Large-Instruct-2411-exl2-8.0bpw	8.0	8.0	97.51 GB