File size: 4,737 Bytes
5355465
 
edf850c
5355465
cac15ea
ee2d9ff
 
 
 
 
cac15ea
ee2d9ff
 
cac15ea
ee2d9ff
 
cac15ea
6ae22c9
95c64cd
6ae22c9
858be9c
edf850c
43bbcdf
 
edf850c
 
 
 
 
 
 
 
fe62f57
95c64cd
edf850c
 
6d2122a
edf850c
6d2122a
edf850c
 
 
 
 
 
6d2122a
edf850c
 
 
 
 
6d2122a
edf850c
 
 
 
 
 
 
 
6d2122a
edf850c
cac15ea
 
ee2d9ff
cac15ea
ee2d9ff
cac15ea
ee2d9ff
cac15ea
ee2d9ff
cac15ea
 
 
 
 
 
 
 
 
ee2d9ff
cac15ea
 
 
 
 
edf850c
 
 
 
ee2d9ff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
license: other
inference: false
---
<!-- header start -->
<div style="width: 100%;">
    <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
</div>
<div style="display: flex; justify-content: space-between; width: 100%;">
    <div style="display: flex; flex-direction: column; align-items: flex-start;">
        <p><a href="https://discord.gg/Jq4vkcDakD">Chat & support: my new Discord server</a></p>
    </div>
    <div style="display: flex; flex-direction: column; align-items: flex-end;">
        <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
    </div>
</div>
<!-- header end -->

# Alpaca LoRA 65B GPTQ 4bit

This is a [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) 4bit quantisation of [changsung's alpaca-lora-65B](https://huggingface.co/chansung/alpaca-lora-65b)

I also have 4bit and 2bit GGML files for cPU inference available here: [TheBloke/alpaca-lora-65B-GGML](https://huggingface.co/TheBloke/alpaca-lora-65B-GGML).

## These files need a lot of VRAM!

I believe they will work on 2 x 24GB cards, and I hope that at least the 1024g file will work on an A100 40GB.

I can't guarantee that the two 128g files will work in only 40GB of VRAM.

I haven't specifically tested VRAM requirements yet but will aim to do so at some point. If you have any experiences to share, please do so in the comments.

If you want to try CPU inference instead, check out my GGML repo: [TheBloke/alpaca-lora-65B-GGML](https://huggingface.co/TheBloke/alpaca-lora-65B-GGML).

## Provided files

Three files are provided, in separate branches.

* `alpaca-lora-65B-GPTQ-4bit-128g.no-act-order.safetensors` - branch main
  * Will require ~40GB of VRAM, meaning you'll need an A100 or 2 x 24GB cards.
  * Parameters: Groupsize = 128g. No act-order.
  * Command used to create the GPTQ:
    ```
    CUDA_VISIBLE_DEVICES=0 python3 llama.py alpaca-lora-65B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors alpaca-lora-65B-GPTQ-4bit-128g.no-act-order.safetensors
    ```
* `alpaca-lora-65B-GPTQ-4bit-128g.safetensors` - branch gptq-4bit-128g-actorder_True
  * Parameters: Groupsize = 128g. act-order.
  * Command used to create the GPTQ:
    ```
    CUDA_VISIBLE_DEVICES=0 python3 llama.py alpaca-lora-65B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors alpaca-lora-65B-GPTQ-4bit-128g.safetensors
    ```
* `alpaca-lora-65B-GPTQ-4bit-1024g.safetensors` - branch gptq-4bit-1024g-actorder_True
  * Parameters: Groupsize = 1024g. act-order.
  * Command used to create the GPTQ:
    ```
    CUDA_VISIBLE_DEVICES=0 python3 llama.py alpaca-lora-65B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 1024 --save_safetensors alpaca-lora-65B-GPTQ-4bit-1024g.safetensors
    ```

## How to run in `text-generation-webui`

Please see one of my more recent repos for instructions on loading GPTQ models in text-generation-webui.

<!-- footer start -->
## Discord

For further support, and discussions on these models and AI in general, join us at:

[TheBloke AI's Discord server](https://discord.gg/Jq4vkcDakD)

## Thanks, and how to contribute.

Thanks to the [chirper.ai](https://chirper.ai) team!

I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.

If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.

Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.

* Patreon: https://patreon.com/TheBlokeAI
* Ko-Fi: https://ko-fi.com/TheBlokeAI

**Patreon special mentions**: Aemon Algiz, Dmitriy Samsonov, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, Jonathan Leane, Talal Aujan, V. Lukas, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Sebastain Graf, Johann-Peter Hartman.

Thank you to all my generous patrons and donaters!
<!-- footer end -->
# Original model card not provided

No model card was provided in [changsung's original repository](https://huggingface.co/chansung/alpaca-lora-65b).

Based on the name, I assume this is the result of fine tuning using the original GPT 3.5 Alpaca dataset. It is unknown as to whether the original Stanford data was used, or the [cleaned tloen/alpaca-lora variant](https://github.com/tloen/alpaca-lora).