TheBloke
/

wizardLM-7B-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Apr 27, 2023

Commit

012fbc4

•

1 Parent(s): 8c469a0

Update README.md

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -20,11 +20,13 @@ This repo contains 4bit GPTQ models for GPU inference, quantised using [GPTQ-for
 ## PERFORMANCE ISSUES
-I am currently working on re-creating these GPTQs due to performance issues reported by many people.
-If you've not yet downloaded the models you might want to wait an hour to see if the new files I'm making now will fix this problem.
-This message will disappear once the problem is resolved.
 ## GIBBERISH OUTPUT IN `text-generation-webui`?

 ## PERFORMANCE ISSUES
+For reasons I can't yet understand, there are performance problems with these 4bit GPTQs that I have not experienced with any other GPTQ 7B or 13B models.
+I have re-made the GPTQs several times, trying various versions of GPTQ-for-LLaMa code. But I currently can't resolve it.
+Using the act-order.safetensors file on Triton code performs acceptably for me, testing on a 4090 - eg 10-13 tokens/s.  But the no-act-order.safetensor file, tested on the older CUDA oobabooga GPTQ-for-LLaMa code, returns only 4 tokens/s.
+I will keep investigating and trying to work out what's happening here. But for the moment, if you're not able to use Triton GPTQ-for-LLaMa, you may want to try another 7B GPTQ model.
 ## GIBBERISH OUTPUT IN `text-generation-webui`?