Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This is the gptq 4bit quantization of this model: https://huggingface.co/jondurbin/airoboros-13b-gpt4

This quantization was made by using this repository: https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/triton

And I used the triton branch with all the gptq implementations available (true_sequential + act_order + groupsize 128)

CUDA_VISIBLE_DEVICES=0 python llama.py ./airoboros-13b-gpt4-TRITON c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors airoboros-13b-gpt4-128g-ts-ao.safetensors

Airoboros 13b gpt4 TRITON (g128 - ts - ao)
PPL: 5.480927467346191
max memory(MiB): 8590.25

Airoboros 13b gpt4 CUDA (g128 - ts)
PPL: 5.535770893096924
max memory(MiB): 8750.4697265625
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.