File size: 5,396 Bytes
23de20c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef2e3df
23de20c
ef2e3df
23de20c
 
 
b8eed6f
 
23de20c
b8eed6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23de20c
b8eed6f
 
 
 
 
 
 
23de20c
 
b8eed6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23de20c
 
 
 
 
 
 
dd23112
 
23de20c
dd23112
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
language:
- en
- fr
- de
- es
- it
- pt
- zh
- ja
- ru
- ko
license: other
license_name: mrl
base_model: mistralai/Pixtral-Large-Instruct-2411
base_model_relation: quantized
inference: false
license_link: https://mistral.ai/licenses/MRL-0.1.md
library_name: transformers
pipeline_tag: image-text-to-text
---

# Pixtral-Large-Instruct-2411 🧡 ExLlamaV2 3.5bpw Quant

3.5bpw quant of [Pixtral-Large-Instruct](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411).  

Vision inputs working on dev branch of [ExLlamaV2](https://github.com/turboderp/exllamav2/tree/dev).  

***21 Dec 2024:** This model has been a LOT of fun to experiment and learn with. Model card updated below with changes made to this repo 
over the last week.*

## Architecture Differences to Pixtral 12B
Pixtral 12B has bias keys for the multi_modal_projector layers, whereas Pixtral Large does not. Instead of including with low/zero values 
this conversion does not include those bias keys, aligning with the keys present in the original Pixtral Large upload from Mistral. The 
model's config.json file includes `"multimodal_projector_bias": false` to flag this. *n.b. If anyone in the community confirms initializing 
these keys with zero values is the better way to go I'm happy to reupload without them excluded.*

## Tokenizer
This model uses a conversion of the Mistral v7m1 tokenizer. Pixtral 12B and Large use different tokenizers with different vocab sizes, 
so make sure you use the right tokenizer. 

## Prompting / Chat Template
The included chat_template.json supports all of Mistral's defined features with some of my own additions.  

I believe this implementation should give quite a lot of flexibility for using the model, and in my testing has worked quite well.  

Example *(line breaks added for readability)*
```
<s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT]  
[INST] [IMG]<user message>  
[AVAILABLE_TOOLS] [<tool definitions>][/AVAILABLE_TOOLS][/INST]  
[IMG]<assistant response>  
[TOOL_CALLS] [<tool calls>][/TOOL_CALLS]  
[TOOL_RESULTS] <tool results including images>[/TOOL_RESULTS]  
</s>[INST] <user message>[/INST]
```

**System Prompts**:  
Messages with role "system" will be parsed as `[SYSTEM_PROMPT] <content>[/SYSTEM_PROMPT]` anywhere they appear in chat history. 

This appears to work pretty well for passing extra instructions at various depths, and keeps instructions separate from conversation. 

**Allowing Non-Alternating Roles**:  
Multiple user messages in a row can be provided, and each will be separated with `[INST][/INST]`. This could work well in group conversation 
settings, or environments where multiple user messages can be provided before the model is invoked. Having a `[/INST]` breaking each one up 
appeared to help prevent the model thinking it needs to respond to every previous message and focus on the last message, while still retaining 
knowledge of what messages sit before it.  

**Image Inputs Everywhere**:  
Images can now be sent in user, assistant, and tool result messages. And seems to actually work. I did tests like including an image on an 
assistant reply 10-15 messages back in the conversation, asked the assistant to recall what image they previously sent, and it was able to 
accurately describe it.  

Having this flexibility could allow for interesting applications, for example if you were to define a tool definition for image generation:
- tool is invoked and calls image generation api/model
- image returned inside tool result message
- model responds with a message with context of the image generated
- you can have further conversation about the generated image, or make revisions with the model actually knowing what was created

## Usage
Working in TabbyAPI with dev branch of ExLlamaV2.
<img src="https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411/resolve/main/image-input-example.jpg">

## Available Sizes
| Repo | Bits | Head Bits | Size |
| ----------- | ------ | ------ | ------ |
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-2.0bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-2.0bpw) | 2.0 | 6.0 | 35.18 GB |
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-2.5bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-2.5bpw) | 2.5 | 6.0 | 39.34 GB |
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-3.0bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-3.0bpw) | 3.0 | 6.0 | 46.42 GB |
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-3.5bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-3.5bpw) | 3.5 | 6.0 | 53.50 GB |
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-4.0bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-4.0bpw) | 4.0 | 6.0 | 60.61 GB |
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-4.5bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-4.5bpw) | 4.5 | 6.0 | 67.68 GB |
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-5.0bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-5.0bpw) | 5.0 | 6.0 | 74.76 GB |
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-6.0bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-6.0bpw) | 6.0 | 8.0 | 88.81 GB |
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-8.0bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-8.0bpw) | 8.0 | 8.0 | 97.51 GB |