README.md · Panchovix/WizardLM-33B-V1.0-Uncensored-SuperHOT-8k-4bit-32g at da86bfd467152ad8f666e23810cd1ec3158e9f4d

metadata

license: other

It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.

I HIGHLY suggest to use exllama, to evade some VRAM issues.

Use (max_seq_len = context):

If max_seq_len = 4096, compress_pos_emb = 2

If max_seq_len = 8192, compress_pos_emb = 4

If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 8192 context, use:

gpu_split: 9,21