Update README.md
Browse files
README.md
CHANGED
@@ -17,10 +17,15 @@ It was created by merging the deltas provided in the above repo with the origina
|
|
17 |
|
18 |
It was then quantized to 4bit, groupsize 128g, using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
|
19 |
|
20 |
-
|
21 |
|
22 |
-
RAM and VRAM usage at the end of a
|
23 |
![Screenshot of RAM and VRAM Usage](https://i.imgur.com/Sl8SmBH.png)
|
|
|
|
|
|
|
|
|
|
|
24 |
|
25 |
## Provided files
|
26 |
|
|
|
17 |
|
18 |
It was then quantized to 4bit, groupsize 128g, using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
|
19 |
|
20 |
+
VRAM usage will depend on the tokens returned. Below approximately 1000 tokens returned it will use <24GB VRAM, but at 1000+ tokens it will exceed the VRAM of a 24GB card.
|
21 |
|
22 |
+
RAM and VRAM usage at the end of a 670 token response in `text-generation-webui` : **5.2GB RAM, 20.7GB VRAM**
|
23 |
![Screenshot of RAM and VRAM Usage](https://i.imgur.com/Sl8SmBH.png)
|
24 |
+
RAM and VRAM usage after about 1500 tokens: **5.2GB RAM, 30.0GB VRAM**
|
25 |
+
![screenshot](https://i.imgur.com/PBNtvwf.png)
|
26 |
+
|
27 |
+
If you want a model that should always stay under 24GB, use this one, provided by MetalX, instead:
|
28 |
+
[GPT4 Alpaca Lora 30B GPTQ 4bit without groupsize](https://huggingface.co/MetaIX/GPT4-X-Alpaca-30B-Int4)
|
29 |
|
30 |
## Provided files
|
31 |
|