--- license: other inference: false ---
TheBlokeAI

Chat & support: my new Discord server

Want to contribute? TheBloke's Patreon page

# Alpaca LoRA 65B GPTQ 4bit This is a [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) 4bit quantisation of [changsung's alpaca-lora-65B](https://huggingface.co/chansung/alpaca-lora-65b) I also have 4bit and 2bit GGML files for cPU inference available here: [TheBloke/alpaca-lora-65B-GGML](https://huggingface.co/TheBloke/alpaca-lora-65B-GGML). ## These files need a lot of VRAM! I believe they will work on 2 x 24GB cards, and I hope that at least the 1024g file will work on an A100 40GB. I can't guarantee that the two 128g files will work in only 40GB of VRAM. I haven't specifically tested VRAM requirements yet but will aim to do so at some point. If you have any experiences to share, please do so in the comments. If you want to try CPU inference instead, check out my GGML repo: [TheBloke/alpaca-lora-65B-GGML](https://huggingface.co/TheBloke/alpaca-lora-65B-GGML). ## Provided files Three files are provided, in separate branches. * `alpaca-lora-65B-GPTQ-4bit-128g.no-act-order.safetensors` - branch main * Will require ~40GB of VRAM, meaning you'll need an A100 or 2 x 24GB cards. * Parameters: Groupsize = 128g. No act-order. * Command used to create the GPTQ: ``` CUDA_VISIBLE_DEVICES=0 python3 llama.py alpaca-lora-65B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors alpaca-lora-65B-GPTQ-4bit-128g.no-act-order.safetensors ``` * `alpaca-lora-65B-GPTQ-4bit-128g.safetensors` - branch gptq-4bit-128g-actorder_True * Parameters: Groupsize = 128g. act-order. * Command used to create the GPTQ: ``` CUDA_VISIBLE_DEVICES=0 python3 llama.py alpaca-lora-65B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors alpaca-lora-65B-GPTQ-4bit-128g.safetensors ``` * `alpaca-lora-65B-GPTQ-4bit-1024g.safetensors` - branch gptq-4bit-1024g-actorder_True * Parameters: Groupsize = 1024g. act-order. * Command used to create the GPTQ: ``` CUDA_VISIBLE_DEVICES=0 python3 llama.py alpaca-lora-65B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 1024 --save_safetensors alpaca-lora-65B-GPTQ-4bit-1024g.safetensors ``` ## How to run in `text-generation-webui` Please see one of my more recent repos for instructions on loading GPTQ models in text-generation-webui. ## Discord For further support, and discussions on these models and AI in general, join us at: [TheBloke AI's Discord server](https://discord.gg/Jq4vkcDakD) ## Thanks, and how to contribute. Thanks to the [chirper.ai](https://chirper.ai) team! I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training. If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects. Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits. * Patreon: https://patreon.com/TheBlokeAI * Ko-Fi: https://ko-fi.com/TheBlokeAI **Patreon special mentions**: Aemon Algiz, Dmitriy Samsonov, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, Jonathan Leane, Talal Aujan, V. Lukas, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Sebastain Graf, Johann-Peter Hartman. Thank you to all my generous patrons and donaters! # Original model card not provided No model card was provided in [changsung's original repository](https://huggingface.co/chansung/alpaca-lora-65b). Based on the name, I assume this is the result of fine tuning using the original GPT 3.5 Alpaca dataset. It is unknown as to whether the original Stanford data was used, or the [cleaned tloen/alpaca-lora variant](https://github.com/tloen/alpaca-lora).