Missing key(s) in state_dict

by athu16 - opened Mar 23, 2023

Mar 23, 2023

I get multiple size mismatch errors while trying to load the model. To my knowledge, the alpaca-native model was finetuned from the llama-13b model (given its file size of about 24 GB). Yet the model name in this repo has "alpaca7b" in it. Where can I find the original 7b alpaca model?

cmh

Mar 23, 2023

•

edited Mar 23, 2023

It is the same size as point-alpaca's weights when I applied the diffs so it's definitly alpaca-7B.
I'm not an expert but it seems that the issue lies between GPTQ-for-llama and alpaca which is llama finetuned and pruned because I've seen a warning about a lenght missmatch when I tried to quantize it myself (it aborted since the free colab ran out of ram unfortunately).

elinas

Mar 23, 2023

I tried quantizing this model as well and it's not possible because of the embedded size when it was trained. Would require modifying keys to the appropriate values before converting or it won't quant correctly. LoRA seems to only convert fine at the moment without any extra work.

ozcur

Owner Mar 23, 2023

As stated on the model card, this was quantized from the fine-tuned 7b model at chavinlo/alpaca-native @cecc16dc15544ee626ae3dfb9dfc5cea8851cf1e. The original alpaca-native model is available there.

ozcur changed discussion status to closed Mar 23, 2023

elinas

Mar 24, 2023

Did you actually test this after quantizing it?

ozcur

Owner Mar 24, 2023

Yes. The inference script in the quant repo provided coherent results.

ozcur

Owner Mar 24, 2023

Here's an example invocation:

https://pastebin.com/K1XqG7Aa

ozcur changed discussion status to open Mar 24, 2023

tarunchand

Mar 25, 2023

└─$ CUDA_VISIBLE_DEVICES=0 python llama_inference.py /home/me/GPT/text-generation-webui/models/alpaca-7b --wbits 4 --load /home/me/GPT/text-generation-webui/models/alpaca-7b/alpaca-7b-4bit.pt --max_length 300 --text "$(cat test_prompt.txt)"
Loading model ...
Traceback (most recent call last):
File "/home/me/GPT/text-generation-webui/repositories/GPTQ-for-LLaMa/llama_inference.py", line 108, in
model = load_quant(args.model, args.load, args.wbits)
File "/home/me/GPT/text-generation-webui/repositories/GPTQ-for-LLaMa/llama_inference.py", line 52, in load_quant
model.load_state_dict(torch.load(checkpoint))
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
Missing key(s) in state_dict: "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.v_proj.zeros", "model.layers.0.self_attn.o_proj.zeros", "model.layers.0.mlp.gate_proj.zeros", "model.layers.0.mlp.down_proj.zeros", "model.layers.0.mlp.up_proj.zeros", "model.layers.1.self_attn.q_proj.zeros", "model.layers.1.self_attn.k_proj.zeros", "model.layers.1.self_attn.v_proj.zeros", "model.layers.1.self_attn.o_proj.zeros", "model.layers.1.mlp.gate_proj.zeros", "model.layers.1.mlp.down_proj.zeros", "model.layers.1.mlp.up_proj.zeros", "model.layers.2.self_attn.q_proj.zeros", "model.layers.2.self_attn.k_proj.zeros", "model.layers.2.self_attn.v_proj.zeros", "model.layers.2.self_attn.o_proj.zeros", "model.layers.2.mlp.gate_proj.zeros", "model.layers.2.mlp.down_proj.zeros", "model.layers.2.mlp.up_proj.zeros", "model.layers.3.self_attn.q_proj.zeros", "model.layers.3.self_attn.k_proj.zeros", "model.layers.3.self_attn.v_proj.zeros", "model.layers.3.self_attn.o_proj.zeros", "model.layers.3.mlp.gate_proj.zeros", "model.layers.3.mlp.down_proj.zeros", "model.layers.3.mlp.up_proj.zeros", "model.layers.4.self_attn.q_proj.zeros", "model.layers.4.self_attn.k_proj.zeros", "model.layers.4.self_attn.v_proj.zeros", "model.layers.4.self_attn.o_proj.zeros", "model.layers.4.mlp.gate_proj.zeros", "model.layers.4.mlp.down_proj.zeros", "model.layers.4.mlp.up_proj.zeros", "model.layers.5.self_attn.q_proj.zeros", "model.layers.5.self_attn.k_proj.zeros", "model.layers.5.self_attn.v_proj.zeros", "model.layers.5.self_attn.o_proj.zeros", "model.layers.5.mlp.gate_proj.zeros", "model.layers.5.mlp.down_proj.zeros", "model.layers.5.mlp.up_proj.zeros", "model.layers.6.self_attn.q_proj.zeros", "model.layers.6.self_attn.k_proj.zeros", "model.layers.6.self_attn.v_proj.zeros", "model.layers.6.self_attn.o_proj.zeros", "model.layers.6.mlp.gate_proj.zeros", "model.layers.6.mlp.down_proj.zeros", "model.layers.6.mlp.up_proj.zeros", "model.layers.7.self_attn.q_proj.zeros", "model.layers.7.self_attn.k_proj.zeros", "model.layers.7.self_attn.v_proj.zeros", "model.layers.7.self_attn.o_proj.zeros", "model.layers.7.mlp.gate_proj.zeros", "model.layers.7.mlp.down_proj.zeros", "model.layers.7.mlp.up_proj.zeros", "model.layers.8.self_attn.q_proj.zeros", "model.layers.8.self_attn.k_proj.zeros", "model.layers.8.self_attn.v_proj.zeros", "model.layers.8.self_attn.o_proj.zeros", "model.layers.8.mlp.gate_proj.zeros", "model.layers.8.mlp.down_proj.zeros", "model.layers.8.mlp.up_proj.zeros", "model.layers.9.self_attn.q_proj.zeros", "model.layers.9.self_attn.k_proj.zeros", "model.layers.9.self_attn.v_proj.zeros", "model.layers.9.self_attn.o_proj.zeros", "model.layers.9.mlp.gate_proj.zeros", "model.layers.9.mlp.down_proj.zeros", "model.layers.9.mlp.up_proj.zeros", "model.layers.10.self_attn.q_proj.zeros", "model.layers.10.self_attn.k_proj.zeros", "model.layers.10.self_attn.v_proj.zeros", "model.layers.10.self_attn.o_proj.zeros", "model.layers.10.mlp.gate_proj.zeros", "model.layers.10.mlp.down_proj.zeros", "model.layers.10.mlp.up_proj.zeros", "model.layers.11.self_attn.q_proj.zeros", "model.layers.11.self_attn.k_proj.zeros", "model.layers.11.self_attn.v_proj.zeros", "model.layers.11.self_attn.o_proj.zeros", "model.layers.11.mlp.gate_proj.zeros", "model.layers.11.mlp.down_proj.zeros", "model.layers.11.mlp.up_proj.zeros", "model.layers.12.self_attn.q_proj.zeros", "model.layers.12.self_attn.k_proj.zeros", "model.layers.12.self_attn.v_proj.zeros", "model.layers.12.self_attn.o_proj.zeros", "model.layers.12.mlp.gate_proj.zeros", "model.layers.12.mlp.down_proj.zeros", "model.layers.12.mlp.up_proj.zeros", "model.layers.13.self_attn.q_proj.zeros", "model.layers.13.self_attn.k_proj.zeros", "model.layers.13.self_attn.v_proj.zeros", "model.layers.13.self_attn.o_proj.zeros", "model.layers.13.mlp.gate_proj.zeros", "model.layers.13.mlp.down_proj.zeros", "model.layers.13.mlp.up_proj.zeros", "model.layers.14.self_attn.q_proj.zeros", "model.layers.14.self_attn.k_proj.zeros", "model.layers.14.self_attn.v_proj.zeros", "model.layers.14.self_attn.o_proj.zeros", "model.layers.14.mlp.gate_proj.zeros", "model.layers.14.mlp.down_proj.zeros", "model.layers.14.mlp.up_proj.zeros", "model.layers.15.self_attn.q_proj.zeros", "model.layers.15.self_attn.k_proj.zeros", "model.layers.15.self_attn.v_proj.zeros", "model.layers.15.self_attn.o_proj.zeros", "model.layers.15.mlp.gate_proj.zeros", "model.layers.15.mlp.down_proj.zeros", "model.layers.15.mlp.up_proj.zeros", "model.layers.16.self_attn.q_proj.zeros", "model.layers.16.self_attn.k_proj.zeros", "model.layers.16.self_attn.v_proj.zeros", "model.layers.16.self_attn.o_proj.zeros", "model.layers.16.mlp.gate_proj.zeros", "model.layers.16.mlp.down_proj.zeros", "model.layers.16.mlp.up_proj.zeros", "model.layers.17.self_attn.q_proj.zeros", "model.layers.17.self_attn.k_proj.zeros", "model.layers.17.self_attn.v_proj.zeros", "model.layers.17.self_attn.o_proj.zeros", "model.layers.17.mlp.gate_proj.zeros", "model.layers.17.mlp.down_proj.zeros", "model.layers.17.mlp.up_proj.zeros", "model.layers.18.self_attn.q_proj.zeros", "model.layers.18.self_attn.k_proj.zeros", "model.layers.18.self_attn.v_proj.zeros", "model.layers.18.self_attn.o_proj.zeros", "model.layers.18.mlp.gate_proj.zeros", "model.layers.18.mlp.down_proj.zeros", "model.layers.18.mlp.up_proj.zeros", "model.layers.19.self_attn.q_proj.zeros", "model.layers.19.self_attn.k_proj.zeros", "model.layers.19.self_attn.v_proj.zeros", "model.layers.19.self_attn.o_proj.zeros", "model.layers.19.mlp.gate_proj.zeros", "model.layers.19.mlp.down_proj.zeros", "model.layers.19.mlp.up_proj.zeros", "model.layers.20.self_attn.q_proj.zeros", "model.layers.20.self_attn.k_proj.zeros", "model.layers.20.self_attn.v_proj.zeros", "model.layers.20.self_attn.o_proj.zeros", "model.layers.20.mlp.gate_proj.zeros", "model.layers.20.mlp.down_proj.zeros", "model.layers.20.mlp.up_proj.zeros", "model.layers.21.self_attn.q_proj.zeros", "model.layers.21.self_attn.k_proj.zeros", "model.layers.21.self_attn.v_proj.zeros", "model.layers.21.self_attn.o_proj.zeros", "model.layers.21.mlp.gate_proj.zeros", "model.layers.21.mlp.down_proj.zeros", "model.layers.21.mlp.up_proj.zeros", "model.layers.22.self_attn.q_proj.zeros", "model.layers.22.self_attn.k_proj.zeros", "model.layers.22.self_attn.v_proj.zeros", "model.layers.22.self_attn.o_proj.zeros", "model.layers.22.mlp.gate_proj.zeros", "model.layers.22.mlp.down_proj.zeros", "model.layers.22.mlp.up_proj.zeros", "model.layers.23.self_attn.q_proj.zeros", "model.layers.23.self_attn.k_proj.zeros", "model.layers.23.self_attn.v_proj.zeros", "model.layers.23.self_attn.o_proj.zeros", "model.layers.23.mlp.gate_proj.zeros", "model.layers.23.mlp.down_proj.zeros", "model.layers.23.mlp.up_proj.zeros", "model.layers.24.self_attn.q_proj.zeros", "model.layers.24.self_attn.k_proj.zeros", "model.layers.24.self_attn.v_proj.zeros", "model.layers.24.self_attn.o_proj.zeros", "model.layers.24.mlp.gate_proj.zeros", "model.layers.24.mlp.down_proj.zeros", "model.layers.24.mlp.up_proj.zeros", "model.layers.25.self_attn.q_proj.zeros", "model.layers.25.self_attn.k_proj.zeros", "model.layers.25.self_attn.v_proj.zeros", "model.layers.25.self_attn.o_proj.zeros", "model.layers.25.mlp.gate_proj.zeros", "model.layers.25.mlp.down_proj.zeros", "model.layers.25.mlp.up_proj.zeros", "model.layers.26.self_attn.q_proj.zeros", "model.layers.26.self_attn.k_proj.zeros", "model.layers.26.self_attn.v_proj.zeros", "model.layers.26.self_attn.o_proj.zeros", "model.layers.26.mlp.gate_proj.zeros", "model.layers.26.mlp.down_proj.zeros", "model.layers.26.mlp.up_proj.zeros", "model.layers.27.self_attn.q_proj.zeros", "model.layers.27.self_attn.k_proj.zeros", "model.layers.27.self_attn.v_proj.zeros", "model.layers.27.self_attn.o_proj.zeros", "model.layers.27.mlp.gate_proj.zeros", "model.layers.27.mlp.down_proj.zeros", "model.layers.27.mlp.up_proj.zeros", "model.layers.28.self_attn.q_proj.zeros", "model.layers.28.self_attn.k_proj.zeros", "model.layers.28.self_attn.v_proj.zeros", "model.layers.28.self_attn.o_proj.zeros", "model.layers.28.mlp.gate_proj.zeros", "model.layers.28.mlp.down_proj.zeros", "model.layers.28.mlp.up_proj.zeros", "model.layers.29.self_attn.q_proj.zeros", "model.layers.29.self_attn.k_proj.zeros", "model.layers.29.self_attn.v_proj.zeros", "model.layers.29.self_attn.o_proj.zeros", "model.layers.29.mlp.gate_proj.zeros", "model.layers.29.mlp.down_proj.zeros", "model.layers.29.mlp.up_proj.zeros", "model.layers.30.self_attn.q_proj.zeros", "model.layers.30.self_attn.k_proj.zeros", "model.layers.30.self_attn.v_proj.zeros", "model.layers.30.self_attn.o_proj.zeros", "model.layers.30.mlp.gate_proj.zeros", "model.layers.30.mlp.down_proj.zeros", "model.layers.30.mlp.up_proj.zeros", "model.layers.31.self_attn.q_proj.zeros", "model.layers.31.self_attn.k_proj.zeros", "model.layers.31.self_attn.v_proj.zeros", "model.layers.31.self_attn.o_proj.zeros", "model.layers.31.mlp.gate_proj.zeros", "model.layers.31.mlp.down_proj.zeros", "model.layers.31.mlp.up_proj.zeros".
Unexpected key(s) in state_dict: "model.layers.0.self_attn.q_proj.qzeros", "model.layers.0.self_attn.k_proj.qzeros", "model.layers.0.self_attn.v_proj.qzeros", "model.layers.0.self_attn.o_proj.qzeros", "model.layers.0.mlp.gate_proj.qzeros", "model.layers.0.mlp.down_proj.qzeros", "model.layers.0.mlp.up_proj.qzeros", "model.layers.1.self_attn.q_proj.qzeros", "model.layers.1.self_attn.k_proj.qzeros", "model.layers.1.self_attn.v_proj.qzeros", "model.layers.1.self_attn.o_proj.qzeros", "model.layers.1.mlp.gate_proj.qzeros", "model.layers.1.mlp.down_proj.qzeros", "model.layers.1.mlp.up_proj.qzeros", "model.layers.2.self_attn.q_proj.qzeros", "model.layers.2.self_attn.k_proj.qzeros", "model.layers.2.self_attn.v_proj.qzeros", "model.layers.2.self_attn.o_proj.qzeros", "model.layers.2.mlp.gate_proj.qzeros", "model.layers.2.mlp.down_proj.qzeros", "model.layers.2.mlp.up_proj.qzeros", "model.layers.3.self_attn.q_proj.qzeros", "model.layers.3.self_attn.k_proj.qzeros", "model.layers.3.self_attn.v_proj.qzeros", "model.layers.3.self_attn.o_proj.qzeros", "model.layers.3.mlp.gate_proj.qzeros", "model.layers.3.mlp.down_proj.qzeros", "model.layers.3.mlp.up_proj.qzeros", "model.layers.4.self_attn.q_proj.qzeros", "model.layers.4.self_attn.k_proj.qzeros", "model.layers.4.self_attn.v_proj.qzeros", "model.layers.4.self_attn.o_proj.qzeros", "model.layers.4.mlp.gate_proj.qzeros", "model.layers.4.mlp.down_proj.qzeros", "model.layers.4.mlp.up_proj.qzeros", "model.layers.5.self_attn.q_proj.qzeros", "model.layers.5.self_attn.k_proj.qzeros", "model.layers.5.self_attn.v_proj.qzeros", "model.layers.5.self_attn.o_proj.qzeros", "model.layers.5.mlp.gate_proj.qzeros", "model.layers.5.mlp.down_proj.qzeros", "model.layers.5.mlp.up_proj.qzeros", "model.layers.6.self_attn.q_proj.qzeros", "model.layers.6.self_attn.k_proj.qzeros", "model.layers.6.self_attn.v_proj.qzeros", "model.layers.6.self_attn.o_proj.qzeros", "model.layers.6.mlp.gate_proj.qzeros", "model.layers.6.mlp.down_proj.qzeros", "model.layers.6.mlp.up_proj.qzeros", "model.layers.7.self_attn.q_proj.qzeros", "model.layers.7.self_attn.k_proj.qzeros", "model.layers.7.self_attn.v_proj.qzeros", "model.layers.7.self_attn.o_proj.qzeros", "model.layers.7.mlp.gate_proj.qzeros", "model.layers.7.mlp.down_proj.qzeros", "model.layers.7.mlp.up_proj.qzeros", "model.layers.8.self_attn.q_proj.qzeros", "model.layers.8.self_attn.k_proj.qzeros", "model.layers.8.self_attn.v_proj.qzeros", "model.layers.8.self_attn.o_proj.qzeros", "model.layers.8.mlp.gate_proj.qzeros", "model.layers.8.mlp.down_proj.qzeros", "model.layers.8.mlp.up_proj.qzeros", "model.layers.9.self_attn.q_proj.qzeros", "model.layers.9.self_attn.k_proj.qzeros", "model.layers.9.self_attn.v_proj.qzeros", "model.layers.9.self_attn.o_proj.qzeros", "model.layers.9.mlp.gate_proj.qzeros", "model.layers.9.mlp.down_proj.qzeros", "model.layers.9.mlp.up_proj.qzeros", "model.layers.10.self_attn.q_proj.qzeros", "model.layers.10.self_attn.k_proj.qzeros", "model.layers.10.self_attn.v_proj.qzeros", "model.layers.10.self_attn.o_proj.qzeros", "model.layers.10.mlp.gate_proj.qzeros", "model.layers.10.mlp.down_proj.qzeros", "model.layers.10.mlp.up_proj.qzeros", "model.layers.11.self_attn.q_proj.qzeros", "model.layers.11.self_attn.k_proj.qzeros", "model.layers.11.self_attn.v_proj.qzeros", "model.layers.11.self_attn.o_proj.qzeros", "model.layers.11.mlp.gate_proj.qzeros", "model.layers.11.mlp.down_proj.qzeros", "model.layers.11.mlp.up_proj.qzeros", "model.layers.12.self_attn.q_proj.qzeros", "model.layers.12.self_attn.k_proj.qzeros", "model.layers.12.self_attn.v_proj.qzeros", "model.layers.12.self_attn.o_proj.qzeros", "model.layers.12.mlp.gate_proj.qzeros", "model.layers.12.mlp.down_proj.qzeros", "model.layers.12.mlp.up_proj.qzeros", "model.layers.13.self_attn.q_proj.qzeros", "model.layers.13.self_attn.k_proj.qzeros", "model.layers.13.self_attn.v_proj.qzeros", "model.layers.13.self_attn.o_proj.qzeros", "model.layers.13.mlp.gate_proj.qzeros", "model.layers.13.mlp.down_proj.qzeros", "model.layers.13.mlp.up_proj.qzeros", "model.layers.14.self_attn.q_proj.qzeros", "model.layers.14.self_attn.k_proj.qzeros", "model.layers.14.self_attn.v_proj.qzeros", "model.layers.14.self_attn.o_proj.qzeros", "model.layers.14.mlp.gate_proj.qzeros", "model.layers.14.mlp.down_proj.qzeros", "model.layers.14.mlp.up_proj.qzeros", "model.layers.15.self_attn.q_proj.qzeros", "model.layers.15.self_attn.k_proj.qzeros", "model.layers.15.self_attn.v_proj.qzeros", "model.layers.15.self_attn.o_proj.qzeros", "model.layers.15.mlp.gate_proj.qzeros", "model.layers.15.mlp.down_proj.qzeros", "model.layers.15.mlp.up_proj.qzeros", "model.layers.16.self_attn.q_proj.qzeros", "model.layers.16.self_attn.k_proj.qzeros", "model.layers.16.self_attn.v_proj.qzeros", "model.layers.16.self_attn.o_proj.qzeros", "model.layers.16.mlp.gate_proj.qzeros", "model.layers.16.mlp.down_proj.qzeros", "model.layers.16.mlp.up_proj.qzeros", "model.layers.17.self_attn.q_proj.qzeros", "model.layers.17.self_attn.k_proj.qzeros", "model.layers.17.self_attn.v_proj.qzeros", "model.layers.17.self_attn.o_proj.qzeros", "model.layers.17.mlp.gate_proj.qzeros", "model.layers.17.mlp.down_proj.qzeros", "model.layers.17.mlp.up_proj.qzeros", "model.layers.18.self_attn.q_proj.qzeros", "model.layers.18.self_attn.k_proj.qzeros", "model.layers.18.self_attn.v_proj.qzeros", "model.layers.18.self_attn.o_proj.qzeros", "model.layers.18.mlp.gate_proj.qzeros", "model.layers.18.mlp.down_proj.qzeros", "model.layers.18.mlp.up_proj.qzeros", "model.layers.19.self_attn.q_proj.qzeros", "model.layers.19.self_attn.k_proj.qzeros", "model.layers.19.self_attn.v_proj.qzeros", "model.layers.19.self_attn.o_proj.qzeros", "model.layers.19.mlp.gate_proj.qzeros", "model.layers.19.mlp.down_proj.qzeros", "model.layers.19.mlp.up_proj.qzeros", "model.layers.20.self_attn.q_proj.qzeros", "model.layers.20.self_attn.k_proj.qzeros", "model.layers.20.self_attn.v_proj.qzeros", "model.layers.20.self_attn.o_proj.qzeros", "model.layers.20.mlp.gate_proj.qzeros", "model.layers.20.mlp.down_proj.qzeros", "model.layers.20.mlp.up_proj.qzeros", "model.layers.21.self_attn.q_proj.qzeros", "model.layers.21.self_attn.k_proj.qzeros", "model.layers.21.self_attn.v_proj.qzeros", "model.layers.21.self_attn.o_proj.qzeros", "model.layers.21.mlp.gate_proj.qzeros", "model.layers.21.mlp.down_proj.qzeros", "model.layers.21.mlp.up_proj.qzeros", "model.layers.22.self_attn.q_proj.qzeros", "model.layers.22.self_attn.k_proj.qzeros", "model.layers.22.self_attn.v_proj.qzeros", "model.layers.22.self_attn.o_proj.qzeros", "model.layers.22.mlp.gate_proj.qzeros", "model.layers.22.mlp.down_proj.qzeros", "model.layers.22.mlp.up_proj.qzeros", "model.layers.23.self_attn.q_proj.qzeros", "model.layers.23.self_attn.k_proj.qzeros", "model.layers.23.self_attn.v_proj.qzeros", "model.layers.23.self_attn.o_proj.qzeros", "model.layers.23.mlp.gate_proj.qzeros", "model.layers.23.mlp.down_proj.qzeros", "model.layers.23.mlp.up_proj.qzeros", "model.layers.24.self_attn.q_proj.qzeros", "model.layers.24.self_attn.k_proj.qzeros", "model.layers.24.self_attn.v_proj.qzeros", "model.layers.24.self_attn.o_proj.qzeros", "model.layers.24.mlp.gate_proj.qzeros", "model.layers.24.mlp.down_proj.qzeros", "model.layers.24.mlp.up_proj.qzeros", "model.layers.25.self_attn.q_proj.qzeros", "model.layers.25.self_attn.k_proj.qzeros", "model.layers.25.self_attn.v_proj.qzeros", "model.layers.25.self_attn.o_proj.qzeros", "model.layers.25.mlp.gate_proj.qzeros", "model.layers.25.mlp.down_proj.qzeros", "model.layers.25.mlp.up_proj.qzeros", "model.layers.26.self_attn.q_proj.qzeros", "model.layers.26.self_attn.k_proj.qzeros", "model.layers.26.self_attn.v_proj.qzeros", "model.layers.26.self_attn.o_proj.qzeros", "model.layers.26.mlp.gate_proj.qzeros", "model.layers.26.mlp.down_proj.qzeros", "model.layers.26.mlp.up_proj.qzeros", "model.layers.27.self_attn.q_proj.qzeros", "model.layers.27.self_attn.k_proj.qzeros", "model.layers.27.self_attn.v_proj.qzeros", "model.layers.27.self_attn.o_proj.qzeros", "model.layers.27.mlp.gate_proj.qzeros", "model.layers.27.mlp.down_proj.qzeros", "model.layers.27.mlp.up_proj.qzeros", "model.layers.28.self_attn.q_proj.qzeros", "model.layers.28.self_attn.k_proj.qzeros", "model.layers.28.self_attn.v_proj.qzeros", "model.layers.28.self_attn.o_proj.qzeros", "model.layers.28.mlp.gate_proj.qzeros", "model.layers.28.mlp.down_proj.qzeros", "model.layers.28.mlp.up_proj.qzeros", "model.layers.29.self_attn.q_proj.qzeros", "model.layers.29.self_attn.k_proj.qzeros", "model.layers.29.self_attn.v_proj.qzeros", "model.layers.29.self_attn.o_proj.qzeros", "model.layers.29.mlp.gate_proj.qzeros", "model.layers.29.mlp.down_proj.qzeros", "model.layers.29.mlp.up_proj.qzeros", "model.layers.30.self_attn.q_proj.qzeros", "model.layers.30.self_attn.k_proj.qzeros", "model.layers.30.self_attn.v_proj.qzeros", "model.layers.30.self_attn.o_proj.qzeros", "model.layers.30.mlp.gate_proj.qzeros", "model.layers.30.mlp.down_proj.qzeros", "model.layers.30.mlp.up_proj.qzeros", "model.layers.31.self_attn.q_proj.qzeros", "model.layers.31.self_attn.k_proj.qzeros", "model.layers.31.self_attn.v_proj.qzeros", "model.layers.31.self_attn.o_proj.qzeros", "model.layers.31.mlp.gate_proj.qzeros", "model.layers.31.mlp.down_proj.qzeros", "model.layers.31.mlp.up_proj.qzeros".
size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.0.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.0.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.0.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.1.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.1.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.1.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.1.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.1.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.1.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.1.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.2.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.2.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.2.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.2.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.2.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.2.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.2.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.3.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.3.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.3.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.3.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.3.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.3.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.3.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.4.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.4.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.4.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.4.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.4.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.4.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.4.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.5.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.5.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.5.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.5.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.5.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.5.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.5.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.6.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.6.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.6.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.6.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.6.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.6.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.6.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.7.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.7.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.7.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.7.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.7.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.7.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.7.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.8.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.8.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.8.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.8.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.8.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.8.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.8.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.9.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.9.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.9.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.9.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.9.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.9.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.9.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.10.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.10.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.10.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.10.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.10.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.10.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.10.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.11.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.11.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.11.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.11.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.11.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.11.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.11.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.12.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.12.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.12.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.12.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.12.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.12.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.12.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.13.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.13.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.13.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.13.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.13.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.13.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.13.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.14.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.14.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.14.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.14.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.14.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.14.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.14.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.15.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.15.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.15.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.15.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.15.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.15.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.15.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.16.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.16.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.16.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.16.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.16.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.16.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.16.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.17.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.17.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.17.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.17.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.17.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.17.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.17.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.18.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.18.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.18.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.18.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.18.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.18.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.18.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.19.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.19.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.19.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.19.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.19.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.19.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.19.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.20.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.20.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.20.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.20.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.20.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.20.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.20.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.21.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.21.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.21.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.21.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.21.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.21.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.21.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.22.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.22.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.22.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.22.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.22.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.22.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.22.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.23.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.23.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.23.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.23.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.23.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.23.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.23.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.24.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.24.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.24.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.24.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.24.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.24.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.24.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.25.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.25.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.25.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.25.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.25.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.25.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.25.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.26.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.26.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.26.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.26.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.26.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.26.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.26.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.27.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.27.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.27.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.27.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.27.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.27.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.27.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.28.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.28.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.28.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.28.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.28.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.28.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.28.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.29.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.29.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.29.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.29.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.29.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.29.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.29.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.30.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.30.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.30.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.30.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.30.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.30.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.30.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.31.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.31.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.31.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.31.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.31.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.31.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.31.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).

cmh

Mar 25, 2023

•

edited Mar 26, 2023

At first everybody thought it was broken but ozcur was right it's not (sorry about that).
Anyway, there's two methods:

Use GPTQ-for-LLaMa's inference script like in the exemple provided on the model card:
python llama_inference.py /path/alpaca-native-4bit --wbits 4 --groupsize 128 --load /path/alpaca-native-4bit/alpaca7b-4bit.pt --max_length 300 --text "your text"
Try the gptq-group-size branch of text-generation-webui (wip, supports groupsize 128):
python server.py --model alpaca-native-4bit --gptq-bits 4 --gptq-model-type llama

https://github.com/oobabooga/text-generation-webui/pull/530

edit: with the new update of text-generation-webui it is:
python server.py --model alpaca-native-4bit --wbits 4 --model_type llama --groupsize 128

tarunchand

Mar 27, 2023

Thanks

Reggie

Apr 3, 2023

Not sure what I'm missing here but I keep getting the missing key(s) error.
This is the command I'm running, which is pretty much the same as the README: python llama_inference.py /content/models/ozcur/alpaca-native-4bit --wbits 4 --groupsize 128 --load /content/models/ozcur/alpaca-native-4bit/alpaca7b-4bit.pt --max_length 500 --text "Instruction: What is an alpaca? How is it different from a llama?"

Can anyone help please.

cmh

Apr 3, 2023

•

edited Apr 3, 2023

GPTQ-for-llama have seen some changes lately, are you sure you are not using the
default triton git branch instead of the cuda one?

Edit: Check llama's entry on the textgen-webui wiki for more infos.

Reggie

Apr 4, 2023

Ya I'm on the latest triton branch.
I did check the textgen wiki. But no luck still. I have this issue with other 4-bit models as well.

cmh

Apr 4, 2023

That's probably the issue, switch to the cuda branch.

Reggie

Apr 5, 2023

Cuda branch also fails. Me (& others) seem to be having this problem with other models as well: https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g/discussions/4
Should I be on a specific commit of the Cuda branch. This stuff is developing so fast I can barely keep up!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment