ProphetOfBostrom's picture
utility: maximised.
2b60801 verified
---
base_model:
- Doctor-Shotgun/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss
- jondurbin/bagel-dpo-8x7b-v0.2
- NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss
license: cc-by-nc-4.0
tags:
- not-for-all-audiences
- nsfw
- mergekit
- merge
- HQQ
- 2bit
---
## BagelMix-8x7B - main branch 2g16-4g64-HQQ
Under 20 GB
By [Undi95](https://huggingface.co/Undi95/BagelMix-8x7B)
#### (this readme has been written by a sleepy person. the link above takes you to the original model, the link below to the Mixtral HQQ reference. the rest is rambling)
---
[main branch is the same quant config as last time, the reference one from mobius here](https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ)
the label i've chosen here refers to 2 bit linear layers with a 16 param group size (per 8 bit group weight), and 4 bits in groups of 64 for the attention layers
thus the actual bpw is higher than 2 in no small part because we're adding another byte every 4 bytes (i think??) for the linear layers.
from what I can gather of hqq's source code, the gate ('expert' selection) network isn't quantised (because it's tiny and very important)
this is the reason we quantise the attention layers at 4 bits too - in a MoE it's small (shared between all the 'experts') which means it would quantize like a 2 bpw mistral
such reasoning has lead me to try experimenting with taking more bits away from ther expert/linear layers and put them in the attention layers.
i've currently got a slightly heavier 2g16 experts with 8g512 attention (not really sure how meaningful groups of 512 are but w/e) model already,
which would look like this, which is *not the model on the main branch*:
```
attn_prams = BaseQuantizeConfig(nbits=8, group_size=512, quant_zero=True, quant_scale=True) # MAIN BRANCH IS nbits=4 group_size=64 !!!
attn_prams['scale_quant_params']['group_size'] = 512 #was 256, not sure what this does lol
experts_params = BaseQuantizeConfig(nbits=2, group_size=16, quant_zero=True, quant_scale=True)
```
again this is not what you're downloading if you get this right now: I want to see if I can actually keep the bpw down.
these will be uploaded as alternate branches to this repo if they seem worth doing.
might fiddle with 2g32 or even 3g128 or such for experts. or try to stop HQQ from casting BF16 to FP16 for no reason.
#### you could also use the included/linked python script (and a big swap partition) to make them yourself.
```
for mixtral, using hqq 0.1.2.post:
you will need >180 gigabytes of physically addressable memory - but it doesn't need to be RAM. Set yourself up with a ~160GB swap partition.
the VRAM requirement is initially zero and never much larger than the emerging model. thus you can make any quant you can run.
```
#### this takes about 10 minutes with the current optimizer - it takes me all day to upload an ~18 GiB file.
## ps read Sleeper Agents (2024/01) :-)
---
# BagelMix
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
## Merge Details
### Merge Method
This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [jondurbin/bagel-dpo-8x7b-v0.2](https://huggingface.co/jondurbin/bagel-dpo-8x7b-v0.2) as a base.
### Models Merged
The following models were included in the merge:
* [Doctor-Shotgun/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss](https://huggingface.co/Doctor-Shotgun/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss)
* [NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss](https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss)
### Configuration
The following YAML configuration was used to produce this model:
```yaml
models:
- model: jondurbin/bagel-dpo-8x7b-v0.2
parameters:
density: 1.0
weight: 1.0
- model: Doctor-Shotgun/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss
parameters:
density: 0.5
weight: [0.33, 0.4, 0.33]
- model: NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss
parameters:
density: [0.33, 0.45, 0.66]
weight: 0.66
merge_method: dare_ties
base_model: jondurbin/bagel-dpo-8x7b-v0.2
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source : union
```
If you want to support me, you can [here](https://ko-fi.com/undiai).