Getting size mismatch for model.layers with alpaca-lora-65B-GPTQ-4bit-1024g.safetensors
Setup as described minus a couple of work arounds as the other user pointed out for bad parameter:
Using model directory, rather than:
"--model medalpaca-13B-GPTQ-4bit"
and changed alpaca-lora-65B-GPTQ-4bit-1024g.safetensors GPTQ setup command to use:
"TheBloke/alpaca-lora-65B-HF"
rather than: alpaca-lora-65B-HF
...This took about 5 hours to run..
However when using:
$ python server.py --model TheBloke_alpaca-lora-65B-GPTQ-4bit --wbits 4 --groupsize 1024 --model_type Llama
With the quantized model alpaca-lora-65B-GPTQ-4bit-1024g.safetensors. Now getting a screen full of this for every layer :
"size mismatch for model.layers.79.mlp.up_proj.scales: copying a param with shape torch.Size([64, 22016]) from checkpoint, the shape in current model is torch.Size([8, 22016])."
Should I not be using groupsize 1024 with this? Any feedback?
NM, Seems like that command is defaulting to the:
alpaca-lora-65B-GPTQ-4bit-128g.safetensors
However when forcing the use of: alpaca-lora-65B-GPTQ-4bit-1024g.safetensors
I get "Could not find the quantized model in .pt or .safetensors format, exiting..."
Appreciate any feedback you can give, to help other people on the right track.
and changed alpaca-lora-65B-GPTQ-4bit-1024g.safetensors GPTQ setup command to use:
"TheBloke/alpaca-lora-65B-HF"
rather than: alpaca-lora-65B-HF
...This took about 5 hours to run..
I'm confused. You didn't re-make the GPTQ did you? The "command used to create" were given just for reference, to explain how I made the files. Not for you to re-run yourself. If you remade the files you don't need this repo! :)
$ python server.py --model TheBloke_alpaca-lora-65B-GPTQ-4bit --wbits 4 --groupsize 1024 --model_type Llama
This is all you need to run to use the files in this repo.
With the quantized model alpaca-lora-65B-GPTQ-4bit-1024g.safetensors. Now getting a screen full of this for every layer :
"size mismatch for model.layers.79.mlp.up_proj.scales: copying a param with shape torch.Size([64, 22016]) from checkpoint, the shape in current model is torch.Size([8, 22016])."
Assuming you did download the files from my repo and not re-make them, then check the sha256sum is correct for the model file you're using. They're large files and that increases the chance of a glitch during download. If you did re-make the files, then this could well be some error during GPTQ creation, which will be hard to diagnose without knowing exactly what you ran and with which version/fork of GPTQ-for-LLaMa.
NM, Seems like that command is defaulting to the:
alpaca-lora-65B-GPTQ-4bit-128g.safetensors
If multiple model files are in the directory, it'll load the first one it finds. Remove any model files you don't want from the model directory.
However when forcing the use of: alpaca-lora-65B-GPTQ-4bit-1024g.safetensors
I get "Could not find the quantized model in .pt or .safetensors format, exiting..."
Sounds like you've either not got the right parameters when starting text-generation-webui
or you've it to look in the wrong model directory. This would happen for example if you forgot the --wbits 4 --groupsize 1024
params to server.py.
Please show the command used to start text-generation-webui, and the contents of your model directory.
Yup, first point. I did indeed remake the GPTQ . I misunderstood the workflow.
Used this to make it:
$ python3 llama.py "TheBloke/alpaca-lora-65B-HF" c4 --wbits 4 --true-sequential --act-order --groupsize 1024 --save_safetensors alpaca-lora-65B-GPTQ-4bit-1024g.safetensors
It ran for 5 hours.. :)
I understand, about the https://github.com/qwopqwop200/GPTQ-for-LLaMa repo, there have been updates to it in the interim, so I will just start over, since it is in a runpod and I created a base template for it anyway. Thanks for the feedback, sorry for misunderstanding the workflow. (I do however appreciate you listing the method you used to generate the GPTQ, even if I missed the intent). I guess I went twenty steps too far.
Also thanks for being responsive Kudos.
OK, so surprising update....
Ran from clean pod, and did not generate GPTQ this time however, same error..... Here is the run command, and the tail of the errors:
python server.py --model TheBloke_alpaca-lora-65B-GPTQ-4bit --wbits 4 --groupsize 1024 --model_type Llama
Gradio HTTP request redirected to localhost :)
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda116.so
Loading TheBloke_alpaca-lora-65B-GPTQ-4bit...
Found the following quantized model: models/TheBloke_alpaca-lora-65B-GPTQ-4bit/alpaca-lora-65B-GPTQ-4bit-1024g.safetensors
.....
Size([8, 8192]). size mismatch for model.layers.79.mlp.down_proj.qzeros: copying a param with shape torch.
Size([1, 1024]) from checkpoint, the shape in current model is torch. Size([22, 1024]).
size mismatch for model.layers.79.mlp.down_proj.scales: copying a param with shape torch.
Size([1, 8192]) from checkpoint, the shape in current model is torch. Size([22, 8192]).
size mismatch for model.layers.79.mlp.gate_proj.qzeros: copying a param with shape torch.
Size([1, 2752]) from checkpoint, the shape in current model is torch. Size([8, 2752]).
size mismatch for model.layers.79.mlp.gate_proj.scales: copying a param with shape torch.
Size([1, 22016]) from checkpoint, the shape in current model is torch. Size([8, 22016]).
size mismatch for model.layers.79.mlp.up_proj.qzeros: copying a param with shape torch.
Size([1, 2752]) from checkpoint, the shape in current model is torch. Size([8, 2752]).
size mismatch for model.layers.79.mlp.up_proj.scales: copying a param with shape torch.
Size([1, 22016]) from checkpoint, the shape in current model is torch. Size([8, 22016]).
So just to clarify, both the GPTQ version that I created from the first post, and now the one in your repo are erring with the same general errors "size mismatch for model layers", although the exact errors ARE different. Also note I only included the errors for the last layers, but cli was full...
I am curious for your input, I am using a new pulls from: https://github.com/oobabooga/text-generation-webui , and https://github.com/qwopqwop200/GPTQ-for-LLaMa . So not sure if versioning differences between that and what you originally ran it with could contribute to this. And again to be clear I ran these commands from a newly spun up docker pod, with no artifacts. The only thing I do have as a carry over is the environmental variables HF_HOME & TRANSFORMERS_CACHE so I could reposition the /.cache/huggingface cache files when I ran the GPTQ due to the dockers constraints, and this directory getting 140GB large.
ADDITIONAL FORGOT TO ADD SINCE YOU ASKED FOR IT:
-rw-rw-rw- 1 root root 6300 Apr 28 04:47 README.md
-rw-rw-rw- 1 root root 33471316664 Apr 28 05:04 alpaca-lora-65B-GPTQ-4bit-1024g.safetensors
-rw-rw-rw- 1 root root 546 Apr 28 05:33 config.json
-rw-rw-rw- 1 root root 132 Apr 28 05:33 generation_config.json
-rw-rw-rw- 1 root root 557 Apr 28 04:47 huggingface-metadata.txt
-rw-rw-rw- 1 root root 66725 Apr 28 05:33 pytorch_model.bin.index.json
-rw-rw-rw- 1 root root 499723 Apr 28 05:33 tokenizer.model