Text Generation
Transformers
mixtral
Not-For-All-Audiences
nsfw
mergekit
Merge
HQQ
2bit
conversational
Inference Endpoints
base_model: | |
- Doctor-Shotgun/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss | |
- jondurbin/bagel-dpo-8x7b-v0.2 | |
- NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss | |
license: cc-by-nc-4.0 | |
tags: | |
- not-for-all-audiences | |
- nsfw | |
- mergekit | |
- merge | |
- HQQ | |
- 2bit | |
## BagelMix-8x7B - main branch 2g16-4g64-HQQ | |
Under 20 GB | |
By [Undi95](https://huggingface.co/Undi95/BagelMix-8x7B) | |
#### (this readme has been written by a sleepy person. the link above takes you to the original model, the link below to the Mixtral HQQ reference. the rest is rambling) | |
--- | |
[main branch is the same quant config as last time, the reference one from mobius here](https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ) | |
the label i've chosen here refers to 2 bit linear layers with a 16 param group size (per 8 bit group weight), and 4 bits in groups of 64 for the attention layers | |
thus the actual bpw is higher than 2 in no small part because we're adding another byte every 4 bytes (i think??) for the linear layers. | |
from what I can gather of hqq's source code, the gate ('expert' selection) network isn't quantised (because it's tiny and very important) | |
this is the reason we quantise the attention layers at 4 bits too - in a MoE it's small (shared between all the 'experts') which means it would quantize like a 2 bpw mistral | |
such reasoning has lead me to try experimenting with taking more bits away from ther expert/linear layers and put them in the attention layers. | |
i've currently got a slightly heavier 2g16 experts with 8g512 attention (not really sure how meaningful groups of 512 are but w/e) model already, | |
which would look like this, which is *not the model on the main branch*: | |
``` | |
attn_prams = BaseQuantizeConfig(nbits=8, group_size=512, quant_zero=True, quant_scale=True) # MAIN BRANCH IS nbits=4 group_size=64 !!! | |
attn_prams['scale_quant_params']['group_size'] = 512 #was 256, not sure what this does lol | |
experts_params = BaseQuantizeConfig(nbits=2, group_size=16, quant_zero=True, quant_scale=True) | |
``` | |
again this is not what you're downloading if you get this right now: I want to see if I can actually keep the bpw down. | |
these will be uploaded as alternate branches to this repo if they seem worth doing. | |
might fiddle with 2g32 or even 3g128 or such for experts. or try to stop HQQ from casting BF16 to FP16 for no reason. | |
#### you could also use the included/linked python script (and a big swap partition) to make them yourself. | |
``` | |
for mixtral, using hqq 0.1.2.post: | |
you will need >180 gigabytes of physically addressable memory - but it doesn't need to be RAM. Set yourself up with a ~160GB swap partition. | |
the VRAM requirement is initially zero and never much larger than the emerging model. thus you can make any quant you can run. | |
``` | |
#### this takes about 10 minutes with the current optimizer - it takes me all day to upload an ~18 GiB file. | |
## ps read Sleeper Agents (2024/01) :-) | |
--- | |
# BagelMix | |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). | |
## Merge Details | |
### Merge Method | |
This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [jondurbin/bagel-dpo-8x7b-v0.2](https://huggingface.co/jondurbin/bagel-dpo-8x7b-v0.2) as a base. | |
### Models Merged | |
The following models were included in the merge: | |
* [Doctor-Shotgun/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss](https://huggingface.co/Doctor-Shotgun/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss) | |
* [NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss](https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss) | |
### Configuration | |
The following YAML configuration was used to produce this model: | |
```yaml | |
models: | |
- model: jondurbin/bagel-dpo-8x7b-v0.2 | |
parameters: | |
density: 1.0 | |
weight: 1.0 | |
- model: Doctor-Shotgun/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss | |
parameters: | |
density: 0.5 | |
weight: [0.33, 0.4, 0.33] | |
- model: NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss | |
parameters: | |
density: [0.33, 0.45, 0.66] | |
weight: 0.66 | |
merge_method: dare_ties | |
base_model: jondurbin/bagel-dpo-8x7b-v0.2 | |
parameters: | |
normalize: true | |
int8_mask: true | |
dtype: bfloat16 | |
tokenizer_source : union | |
``` | |
If you want to support me, you can [here](https://ko-fi.com/undiai). |