File size: 4,737 Bytes
5355465 edf850c 5355465 cac15ea ee2d9ff cac15ea ee2d9ff cac15ea ee2d9ff cac15ea 6ae22c9 95c64cd 6ae22c9 858be9c edf850c 43bbcdf edf850c fe62f57 95c64cd edf850c 6d2122a edf850c 6d2122a edf850c 6d2122a edf850c 6d2122a edf850c 6d2122a edf850c cac15ea ee2d9ff cac15ea ee2d9ff cac15ea ee2d9ff cac15ea ee2d9ff cac15ea ee2d9ff cac15ea edf850c ee2d9ff |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
---
license: other
inference: false
---
<!-- header start -->
<div style="width: 100%;">
<img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
</div>
<div style="display: flex; justify-content: space-between; width: 100%;">
<div style="display: flex; flex-direction: column; align-items: flex-start;">
<p><a href="https://discord.gg/Jq4vkcDakD">Chat & support: my new Discord server</a></p>
</div>
<div style="display: flex; flex-direction: column; align-items: flex-end;">
<p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
</div>
</div>
<!-- header end -->
# Alpaca LoRA 65B GPTQ 4bit
This is a [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) 4bit quantisation of [changsung's alpaca-lora-65B](https://huggingface.co/chansung/alpaca-lora-65b)
I also have 4bit and 2bit GGML files for cPU inference available here: [TheBloke/alpaca-lora-65B-GGML](https://huggingface.co/TheBloke/alpaca-lora-65B-GGML).
## These files need a lot of VRAM!
I believe they will work on 2 x 24GB cards, and I hope that at least the 1024g file will work on an A100 40GB.
I can't guarantee that the two 128g files will work in only 40GB of VRAM.
I haven't specifically tested VRAM requirements yet but will aim to do so at some point. If you have any experiences to share, please do so in the comments.
If you want to try CPU inference instead, check out my GGML repo: [TheBloke/alpaca-lora-65B-GGML](https://huggingface.co/TheBloke/alpaca-lora-65B-GGML).
## Provided files
Three files are provided, in separate branches.
* `alpaca-lora-65B-GPTQ-4bit-128g.no-act-order.safetensors` - branch main
* Will require ~40GB of VRAM, meaning you'll need an A100 or 2 x 24GB cards.
* Parameters: Groupsize = 128g. No act-order.
* Command used to create the GPTQ:
```
CUDA_VISIBLE_DEVICES=0 python3 llama.py alpaca-lora-65B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors alpaca-lora-65B-GPTQ-4bit-128g.no-act-order.safetensors
```
* `alpaca-lora-65B-GPTQ-4bit-128g.safetensors` - branch gptq-4bit-128g-actorder_True
* Parameters: Groupsize = 128g. act-order.
* Command used to create the GPTQ:
```
CUDA_VISIBLE_DEVICES=0 python3 llama.py alpaca-lora-65B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors alpaca-lora-65B-GPTQ-4bit-128g.safetensors
```
* `alpaca-lora-65B-GPTQ-4bit-1024g.safetensors` - branch gptq-4bit-1024g-actorder_True
* Parameters: Groupsize = 1024g. act-order.
* Command used to create the GPTQ:
```
CUDA_VISIBLE_DEVICES=0 python3 llama.py alpaca-lora-65B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 1024 --save_safetensors alpaca-lora-65B-GPTQ-4bit-1024g.safetensors
```
## How to run in `text-generation-webui`
Please see one of my more recent repos for instructions on loading GPTQ models in text-generation-webui.
<!-- footer start -->
## Discord
For further support, and discussions on these models and AI in general, join us at:
[TheBloke AI's Discord server](https://discord.gg/Jq4vkcDakD)
## Thanks, and how to contribute.
Thanks to the [chirper.ai](https://chirper.ai) team!
I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.
* Patreon: https://patreon.com/TheBlokeAI
* Ko-Fi: https://ko-fi.com/TheBlokeAI
**Patreon special mentions**: Aemon Algiz, Dmitriy Samsonov, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, Jonathan Leane, Talal Aujan, V. Lukas, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Sebastain Graf, Johann-Peter Hartman.
Thank you to all my generous patrons and donaters!
<!-- footer end -->
# Original model card not provided
No model card was provided in [changsung's original repository](https://huggingface.co/chansung/alpaca-lora-65b).
Based on the name, I assume this is the result of fine tuning using the original GPT 3.5 Alpaca dataset. It is unknown as to whether the original Stanford data was used, or the [cleaned tloen/alpaca-lora variant](https://github.com/tloen/alpaca-lora).
|