Panchovix
/

WizardLM-33B-V1.0-Uncensored-SuperHOT-8k-4bit-32g

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

WizardLM-33B-V1.0-Uncensored-SuperHOT-8k-4bit-32g / README.md

Panchovix's picture

Update README.md

353231d about 1 year ago

|

history blame contribute delete

616 Bytes

	---
	license: other
	---
	[WizardLM-33B-V1.0-Uncensored](https://huggingface.co/ehartford/WizardLM-33B-V1.0-Uncensored) merged with kaiokendev's [33b SuperHOT 8k LoRA](https://huggingface.co/kaiokendev/superhot-30b-8k-no-rlhf-test), quantized at 4 bit.

	It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.

	I HIGHLY suggest to use exllama, to evade some VRAM issues.

	Use compress_pos_emb = 4 for any context up to 8192 context.

	If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 8192 context, use:

	gpu_split: 9,21