ProphetOfBostrom commited on
Commit
d7bde7a
1 Parent(s): f37cbe5

readme notice but i'm very sleepy please correct my mistakes for me thanks

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md CHANGED
@@ -9,7 +9,38 @@ tags:
9
  - nsfw
10
  - mergekit
11
  - merge
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
 
 
13
  ---
14
  # BagelMix
15
 
 
9
  - nsfw
10
  - mergekit
11
  - merge
12
+ - HQQ
13
+ - 2bit
14
+ library_name: transformers
15
+ ---
16
+ ## BagelMix-8x7B branch 2g16-4g64-HQQ
17
+ By [Undi95](https://huggingface.co/Undi95/BagelMix-8x7B)
18
+
19
+ #### (this readme has been written by a sleepy person. /disclaimer)
20
+ ---
21
+
22
+ [main branch is the same quant config as last time, the reference one from mobius here](https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ)
23
+
24
+ the label i've chosen here refers to 2 bit linear layers with a 16 param group size (per 8 bit group weight), and 4 bits in groups of 64 for the attention layers
25
+ thus the actual bpw is higher than 2 in no small part because we're adding another byte every 4 bytes (i think??) for the linear layers.
26
+
27
+ from what I can gather of hqq's source code, the gate network isn't quantised (because it's tiny and very important)
28
+
29
+ such reasoning has lead me to try experimenting with taking more bits away from ther expert/linear layers and put them in the attention layers.
30
+
31
+ i've currently got a slightly heavier 2g16 experts with 8g512 attention (not really sure how meaningful groups of 512 are but w/e) model already,
32
+ which would look like this, which is *not the model on the main branch*:
33
+ ```
34
+ attn_prams = BaseQuantizeConfig(nbits=8, group_size=512, quant_zero=True, quant_scale=True) # MAIN BRANCH IS nbits=4 group_size=64 !!!
35
+ attn_prams['scale_quant_params']['group_size'] = 512 #was 256, not sure what this does lol
36
+ experts_params = BaseQuantizeConfig(nbits=2, group_size=16, quant_zero=True, quant_scale=True)
37
+ ```
38
+ again this is not what you're downloading if you get this right now: I want to see if I can actually keep the bpw down.
39
+ these will be uploaded as alternate branches to this repo if they seem worth doing.
40
+ might fiddle with 2g32 or even 3g128 or such for experts. given their most delectable sparseness
41
 
42
+ ### you could also use the included python script (and a big swap partition) to make them yourself. again it's just the one from mobiuslabs themselves
43
+ ### ps read Sleeper Agents (2024/01) :-)
44
  ---
45
  # BagelMix
46