TheBloke commited on
Commit
27f7a7e
·
1 Parent(s): 95c93be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -2
README.md CHANGED
@@ -17,10 +17,15 @@ It was created by merging the deltas provided in the above repo with the origina
17
 
18
  It was then quantized to 4bit, groupsize 128g, using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
19
 
20
- In my testing this model uses 19 - 21GB of VRAM for inference and therefore should run on any 24GB VRAM card.
21
 
22
- RAM and VRAM usage at the end of a 2000 token response in `text-generation-webui` : **5.2GB RAM, 20.7GB VRAM**
23
  ![Screenshot of RAM and VRAM Usage](https://i.imgur.com/Sl8SmBH.png)
 
 
 
 
 
24
 
25
  ## Provided files
26
 
 
17
 
18
  It was then quantized to 4bit, groupsize 128g, using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
19
 
20
+ VRAM usage will depend on the tokens returned. Below approximately 1000 tokens returned it will use <24GB VRAM, but at 1000+ tokens it will exceed the VRAM of a 24GB card.
21
 
22
+ RAM and VRAM usage at the end of a 670 token response in `text-generation-webui` : **5.2GB RAM, 20.7GB VRAM**
23
  ![Screenshot of RAM and VRAM Usage](https://i.imgur.com/Sl8SmBH.png)
24
+ RAM and VRAM usage after about 1500 tokens: **5.2GB RAM, 30.0GB VRAM**
25
+ ![screenshot](https://i.imgur.com/PBNtvwf.png)
26
+
27
+ If you want a model that should always stay under 24GB, use this one, provided by MetalX, instead:
28
+ [GPT4 Alpaca Lora 30B GPTQ 4bit without groupsize](https://huggingface.co/MetaIX/GPT4-X-Alpaca-30B-Int4)
29
 
30
  ## Provided files
31