Mismatch in tensor dimensions

by pevogam - opened May 4, 2023

May 4, 2023

I tried this

#:/mnt/local/notebooks/text-generation-webui> python3.9 download-model.py TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ --branch=128-compat

Downloading the model to models/TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ_128-compat
...

and it leads to a lot of errors (and ultimate nonzero exit) like

text-generation-webui-text-generation-webui-1  | 	size mismatch for model.layers.55.mlp.down_proj.qzeros: copying a param with shape torch.Size([140, 832]) from checkpoint, the shape in current model is torch.Size([1, 832]).

all of which follow

text-generation-webui-text-generation-webui-1  | Found the following quantized model: models/TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ_128-compat/OpenAssistant-30B-epoch7-GPTQ-4bit-128g.compat.no-act-order.safetensors
text-generation-webui-text-generation-webui-1  | Loading model ...
text-generation-webui-text-generation-webui-1  | Traceback (most recent call last):
text-generation-webui-text-generation-webui-1  |   File "/app/server.py", line 918, in <module>
text-generation-webui-text-generation-webui-1  |     shared.model, shared.tokenizer = load_model(shared.model_name)
text-generation-webui-text-generation-webui-1  |   File "/app/modules/models.py", line 150, in load_model
text-generation-webui-text-generation-webui-1  |     model = load_quantized(model_name)
text-generation-webui-text-generation-webui-1  |   File "/app/modules/GPTQ_loader.py", line 176, in load_quantized
text-generation-webui-text-generation-webui-1  |     model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
text-generation-webui-text-generation-webui-1  |   File "/app/modules/GPTQ_loader.py", line 77, in _load_quant
text-generation-webui-text-generation-webui-1  |     model.load_state_dict(safe_load(checkpoint), strict=False)
text-generation-webui-text-generation-webui-1  |   File "/app/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
text-generation-webui-text-generation-webui-1  |     raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
text-generation-webui-text-generation-webui-1  | RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:

Am I doing something wrong in the way I download the model or the choice of branch?

Electrofried

May 5, 2023

Getting a similar but slightly different error for the 1028 branch:

size mismatch for model.layers.59.mlp.up_proj.scales: copying a param with shape torch.Size([7, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
text-generation-webui-text-generation-webui-1 exited with code 1

TheBloke

Owner May 5, 2023

OK I will run some tests and let you know

e-caste

May 15, 2023

@TheBloke hi, thank you for quantizing these model checkpoints! I also have a size mismatch error when trying to instance the latest model, with no group size. I can launch the server:

either being very specific: python server.py --gpu-memory 22500MiB --model TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ --groupsize -1 --wbits 4 --model_type llama --api --listen --listen-host=X.Y.Z.T
or being quite generic: python server.py --model TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ --api --listen --listen-host=X.Y.Z.T

I have an RTX 3090 and 48 GB of RAM, I have installed the recommended GPTQ-for-LLaMa (from the docs here), but I have also experimented with the other variants.
What command line arguments or tricks are you using to make it work?

Here are the error logs:

INFO:Loading TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ...
INFO:Found the following quantized model: models/TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ/OpenAssistant-SFT-7-Llama-30B-GPTQ-4bit.safetensors
Traceback (most recent call last):
  File "/home/XXX/text-generation-webui/server.py", line 952, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/XXX/text-generation-webui/modules/models.py", line 159, in load_model
    model = load_quantized(model_name)
  File "/home/XXX/text-generation-webui/modules/GPTQ_loader.py", line 178, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "/home/XXX/text-generation-webui/modules/GPTQ_loader.py", line 84, in _load_quant
    model.load_state_dict(safe_load(checkpoint), strict=False)
  File "/home/XXX/text-generation-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
    size mismatch for model.layers.0.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.0.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.0.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.0.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.0.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.0.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.0.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.0.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.0.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.0.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.1.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.1.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.1.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.1.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.1.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.1.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.1.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.1.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.1.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.1.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.1.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.1.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.1.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.1.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.2.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.2.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.2.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.2.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.2.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.2.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.2.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.2.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.2.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.2.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.2.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.2.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.2.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.2.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.3.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.3.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.3.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.3.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.3.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.3.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.3.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.3.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.3.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.3.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.3.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.3.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.3.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.3.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.4.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.4.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.4.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.4.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.4.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.4.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.4.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.4.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.4.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.4.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.4.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.4.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.4.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.4.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.5.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.5.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.5.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.5.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.5.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.5.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.5.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.5.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.5.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.5.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.5.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.5.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.5.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.5.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.6.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.6.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.6.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.6.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.6.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.6.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.6.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.6.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.6.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.6.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.6.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.6.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.6.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.6.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.7.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.7.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.7.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.7.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.7.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.7.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.7.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.7.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.7.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.7.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.7.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.7.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.7.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.7.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.8.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.8.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.8.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.8.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.8.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.8.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.8.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.8.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.8.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.8.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.8.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.8.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.8.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.8.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.9.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.9.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.9.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.9.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.9.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.9.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.9.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.9.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.9.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.9.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.9.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.9.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.9.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.9.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.10.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.10.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.10.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.10.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.10.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.10.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.10.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.10.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.10.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.10.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.10.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.10.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.10.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.10.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.11.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.11.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.11.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.11.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.11.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.11.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.11.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.11.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.11.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.11.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.11.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.11.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.11.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.11.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.12.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.12.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.12.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.12.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.12.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.12.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.12.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.12.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.12.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.12.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.12.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.12.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.12.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.12.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.13.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.13.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.13.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.13.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.13.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.13.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.13.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.13.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.13.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.13.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.13.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.13.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.13.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.13.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.14.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.14.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.14.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.14.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.14.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.14.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.14.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.14.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.14.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.14.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.14.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.14.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.14.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.14.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.15.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.15.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.15.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.15.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.15.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.15.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.15.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.15.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.15.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.15.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.15.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.15.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.15.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.15.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.16.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.16.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.16.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.16.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.16.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.16.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.16.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.16.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.16.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.16.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.16.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.16.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.16.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.16.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.17.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.17.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.17.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.17.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.17.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.17.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.17.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.17.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.17.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.17.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.17.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.17.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.17.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.17.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.18.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.18.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.18.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.18.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.18.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.18.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.18.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.18.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.18.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.18.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.18.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.18.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.18.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.18.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.19.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.19.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.19.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.19.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.19.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.19.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.19.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.19.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.19.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.19.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.19.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.19.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.19.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.19.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.20.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.20.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.20.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.20.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.20.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.20.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.20.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.20.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.20.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.20.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.20.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.20.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.20.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.20.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.21.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.21.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.21.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.21.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.21.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.21.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.21.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.21.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.21.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.21.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.21.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.21.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.21.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.21.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.22.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.22.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.22.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.22.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.22.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.22.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.22.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.22.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.22.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.22.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.22.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.22.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.22.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.22.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.23.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.23.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.23.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.23.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.23.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.23.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.23.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.23.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.23.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.23.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.23.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.23.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.23.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.23.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.24.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.24.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.24.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.24.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.24.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.24.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.24.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.24.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.24.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.24.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.24.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.24.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.24.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.24.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.25.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.25.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.25.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.25.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.25.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.25.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.25.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.25.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.25.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.25.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.25.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.25.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.25.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.25.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.26.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.26.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.26.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.26.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.26.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.26.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.26.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.26.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.26.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.26.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.26.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.26.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.26.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.26.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.27.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.27.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.27.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.27.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.27.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.27.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.27.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.27.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.27.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.27.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.27.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.27.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.27.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.27.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.28.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.28.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.28.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.28.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.28.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.28.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.28.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.28.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.28.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.28.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.28.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.28.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.28.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.28.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.29.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.29.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.29.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.29.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.29.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.29.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.29.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.29.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.29.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.29.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.29.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.29.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.29.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.29.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.30.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.30.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.30.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.30.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.30.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.30.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.30.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.30.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.30.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.30.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.30.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.30.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.30.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.30.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.31.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.31.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.31.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.31.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.31.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.31.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.31.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.31.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.31.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.31.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.31.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.31.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.31.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.31.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.32.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.32.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.32.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.32.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.32.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.32.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.32.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.32.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.32.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.32.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.32.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.32.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.32.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.32.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.33.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.33.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.33.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.33.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.33.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.33.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.33.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.33.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.33.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.33.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.33.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.33.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.33.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.33.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.34.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.34.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.34.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.34.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.34.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.34.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.34.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.34.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.34.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.34.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.34.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.34.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.34.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.34.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.35.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.35.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.35.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.35.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.35.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.35.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.35.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.35.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.35.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.35.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.35.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.35.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.35.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.35.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.36.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.36.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.36.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.36.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.36.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.36.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.36.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.36.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.36.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.36.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.36.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.36.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.36.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.36.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.37.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.37.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.37.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.37.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.37.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.37.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.37.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.37.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.37.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.37.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.37.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.37.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.37.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.37.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.38.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.38.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.38.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.38.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.38.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.38.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.38.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.38.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.38.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.38.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.38.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.38.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.38.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.38.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.39.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.39.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.39.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.39.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.39.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.39.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.39.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.39.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.39.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.39.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.39.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.39.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.39.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.39.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.40.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.40.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.40.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.40.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.40.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.40.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.40.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.40.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.40.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.40.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.40.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.40.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.40.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.40.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.41.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.41.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.41.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.41.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.41.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.41.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.41.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.41.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.41.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.41.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.41.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.41.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.41.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.41.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.42.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.42.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.42.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.42.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.42.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.42.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.42.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.42.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.42.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.42.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.42.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.42.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.42.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.42.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.43.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.43.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.43.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.43.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.43.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.43.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.43.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.43.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.43.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.43.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.43.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.43.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.43.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.43.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.44.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.44.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.44.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.44.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.44.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.44.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.44.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.44.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.44.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.44.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.44.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.44.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.44.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.44.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.45.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.45.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.45.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.45.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.45.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.45.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.45.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.45.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.45.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.45.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.45.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.45.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.45.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.45.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.46.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.46.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.46.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.46.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.46.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.46.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.46.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.46.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.46.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.46.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.46.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.46.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.46.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.46.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.47.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.47.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.47.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.47.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.47.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.47.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.47.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.47.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.47.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.47.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.47.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.47.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.47.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.47.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.48.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.48.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.48.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.48.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.48.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.48.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.48.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.48.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.48.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.48.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.48.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.48.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.48.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.48.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.49.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.49.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.49.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.49.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.49.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.49.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.49.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.49.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.49.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.49.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.49.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.49.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.49.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.49.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.50.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.50.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.50.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.50.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.50.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.50.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.50.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.50.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.50.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.50.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.50.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.50.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.50.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.50.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.51.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.51.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.51.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.51.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.51.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.51.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.51.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.51.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.51.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.51.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.51.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.51.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.51.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.51.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.52.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.52.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.52.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.52.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.52.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.52.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.52.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.52.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.52.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.52.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.52.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.52.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.52.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.52.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.53.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.53.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.53.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.53.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.53.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.53.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.53.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.53.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.53.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.53.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.53.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.53.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.53.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.53.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.54.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.54.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.54.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.54.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.54.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.54.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.54.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.54.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.54.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.54.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.54.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.54.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.54.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.54.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.55.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.55.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.55.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.55.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.55.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.55.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.55.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.55.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.55.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.55.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.55.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.55.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.55.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.55.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.56.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.56.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.56.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.56.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.56.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.56.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.56.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.56.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.56.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.56.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.56.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.56.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.56.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.56.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.57.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.57.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.57.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.57.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.57.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.57.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.57.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.57.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.57.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.57.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.57.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.57.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.57.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.57.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.58.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.58.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.58.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.58.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.58.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.58.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.58.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.58.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.58.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.58.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.58.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.58.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.58.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.58.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.59.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.59.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.59.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.59.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.59.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.59.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.59.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
    size mismatch for model.layers.59.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
    size mismatch for model.layers.59.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
    size mismatch for model.layers.59.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
    size mismatch for model.layers.59.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.59.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
    size mismatch for model.layers.59.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
    size mismatch for model.layers.59.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).

TheBloke

Owner May 15, 2023

•

edited May 15, 2023

ugh I can't win with this model! I can't seem to make a model that will work for everyone.

Please try either:

One of the models in the other branches, eg 1024-latest - it will require more VRAM though, and will possibly OOM on a 24GB GPU on longer responses;
Or continuing to use the new no-group_size model, but with the 'old' GPTQ-for-LLaMa, like so:

cd text-generation-webui/repositories
mv GPTQ-for-LLaMa latest.GPTQ-for-LLaMa
git clone https://github.com/oobabooga/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
pip uninstall -y quant-cuda
python setup_cuda.py install

This will downgrade you to the earlier GfL which is what I used to quantise the no-group_size model. I used that old version because if I quantise the model with the latest GfL, people can't do CPU offloading on the old GfL, and most people are using the old GfL because it's faster with CUDA and is bundled with text-generation-webui. But from what you're saying, it appears that now the same problem also exists in reverse, and a GPTQ made with old GfL won't work properly with the latest version!

I hope this nightmare of incompatibility will improve in the nearish future when the community moves to AutoGPTQ. But we're not quite there yet.

e-caste

May 16, 2023

@TheBloke thanks for the quick reply! To clarify, I am using the "old" GPTQ-for-LLaMa (which is the recommended one in the docs) when I get this error. If I install the "new" one with the following steps:

pip uninstall quant-cuda (this is the old one)
cd into dir cloned from https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda
CC=gcc-10 CXX=g++-10 python setup_cuda.py install

I still get the same error as posted above.

TheBloke

Owner May 16, 2023

•

edited May 16, 2023

@e-caste OK then I'm confused. The model definitely works with 'old' ooba GPTQ for Llama. I tested it myself, both normally and with --pre_layer, and several other people have confirmed it works.

Please show testing with ooba GfL https://github.com/oobabooga/GPTQ-for-LLaMa and show how you launch text-gen-ui, and the errors you get.

Remember to re-compile ooba GPTQ for LLaMA with python setup_cuda.py install when switching back to it

Also just to be sure, please double check the SHA256SUM of the downloaded model to confirm it's downloaded OK, and that you're using the new file in the main branch:

e-caste

May 16, 2023

•

edited May 16, 2023

@TheBloke sorry for any confusion, let's see if I can be more clear. Here are my steps:

git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
python3.10.6 -m venv venv
source venv/bin/activate
pip install torch torchvision torchaudio this install PyTorch 2.0.1 with CUDA 11.7, even though I have 11.2 installed since I am on Pop!_OS 22.04 (but my NVIDIA drivers version 525.105.17 are compatible with CUDA up to 12.0) -- could this be the culprit?
pip install -r requirements.txt
mkdir repositories
cd repositories
git clone https://github.com/oobabooga/GPTQ-for-LLaMa -b cuda GPTQ-for-LLaMa_oobabooga
ln -s GPTQ-for-LLaMa_oobabooga GPTQ-for-LLaMa the symlink is to allow text-generation-webui to find the repo dir
cd GPTQ-for-LLaMa
sudo apt install gcc-10 g++-10 this is because CUDA 11.2 is only compatible with g++ < 11
CC=gcc-10 CXX=g++-10 python setup_cuda.py install this installs the quant-cuda package, "old" CUDA version, with the correct version of the C/C++ compilers
cd ../..
python download-model.py oobabooga/llama-tokenizer following this line in the docs: ⚠️ The tokenizer files in the sources above may be outdated. Make sure to obtain the universal LLaMA tokenizer as described here. -- could this be the culprit?
python download-model.py TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ --branch main
sha256sum models/TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ/OpenAssistant-SFT-7-Llama-30B-GPTQ-4bit.safetensors SHA-256 checksum is 3200d0314235e82313021303fd2be12e86a819deb6bf951d8179c84d7dd5acfd, seems correct
python server.py --gpu-memory 22500MiB --model TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ --groupsize -1 --wbits 4 --model_type llama --api --listen --listen-host=192.168.X.Y

At this point I get the size mismatch error logs.

My relevant hardware is: RTX 3090, 48 GB RAM.
My OS is: Linux Pop!_OS 22.04 LTS, kernel version 6.2.6-76060206-generic.

TheBloke

Owner May 16, 2023

OK thanks for the details. Can you please try without --gpu-memory ?

e-caste

May 16, 2023

@TheBloke I have tried without --gpu-memory 22500MiB and after rm -rf models/oobabooga_llama-tokenizer, I always get the same size mismatch error.

TheBloke

Owner May 16, 2023

Yeah OK I see the same. Really not sure what's going on here. I'll get back to you.

TheBloke

Owner May 16, 2023

•

edited May 16, 2023

Ok, phew! I've figured it out and it's not the model's fault :) I thought I was going crazy there. I've done this model so many times.

It's a bug in text-gen-ui. Not 100% figured out what's going on, but for some reason it's not applying the model type correctly. It think's it's not a Llama type model.

Please create/edit text-generation-webui/models/config-user.yaml and add the following:

TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ$:
  auto_devices: false
  bf16: false
  cpu: false
  cpu_memory: 0
  disk: false
  gpu_memory_0: 0
  groupsize: None
  load_in_8bit: false
  mlock: false
  model_type: llama
  n_batch: 512
  n_gpu_layers: 0
  pre_layer: 0
  threads: 0
  wbits: '4'

Save that file, then close and re-open text-gen-ui with these args: python server.py --model TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ --api --listen (plus whatever other non-model-related args you want)

root@eb60333c3265:~/test/text-generation-webui# python server.py  --model TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ  --api --listen
INFO:Note: detected 256 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO:Note: NumExpr detected 256 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:NumExpr defaulting to 8 threads.
INFO:Gradio HTTP request redirected to localhost :)
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
INFO:Loading TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ...
INFO:Found the following quantized model: models/TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ/OpenAssistant-SFT-7-Llama-30B-GPTQ-4bit.safetensors
INFO:Loaded the model in 16.59 seconds.

Starting streaming server at ws://0.0.0.0:5005/api/v1/stream
Starting API at http://0.0.0.0:5000/api
INFO:server listening on 0.0.0.0:5005
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.

e-caste

May 16, 2023

I am also trying the 1024-latest model variant, both with and without the custom oobabboga/llama-tokenizer, and I always get the maximum number of tokens of gibberish back:

user: hello!
assistant: blmerge Thom Blatte Giblblbloussmergedemmergeblbl threads松bl Tamergebl Women], blblmergebljetmergeblousmergeblmergeatoblmergemergebluousmergeblshipblbl),mergemergeivablbl.),blblboundblmerge Howardblblmerge若blmergeBl'blblGiblbl womenblblateblmergeageblblattemergebl Ratblbl HDmergeblinemergemergemerge Blblbl Gibl Womenmergebl Giremblmergeleaseblblatblbl Blblmergeshipmergebl Blattemergemergeatoatteblblwärmergemerge.),mergebl)--blmerge Heinrichblblinglyblbl Felblbl WomenattemergeattemergeIKblblInvokeblbl Tablblouslyblbl�blmergeattebl.),merge Furblbl--blbl Threadblmerge.),atteblmergepportblblshipmergeatteofblbl MissourimergebloussblblThreadblblplanblblravblbl Girlsmergebl),blblatoblbl],blblakiblbl rmblbl.--blbljetblbl Russianblbl Henriblblvblbl remainingblblaticallyblbl rasblblVEblbldobblbl approblbl Rugbyblblremblblasyblblatablbl()),blblenneblblousblbl ratblbl photographblbl `--blblatableblblptblbl рblbllotblbl'blmergeoussblshipshipbl.),shipbl Bl.),bl.),atoblshipatoblatomergeblattebl Bl Womenblblrablblityblbl Potblbléeblbl)-- Blblatteshipmergeatomergeato Blblship),blatte.),blatte Blbl Blmergeblatoshipblshipremblatteatobl.),),blmerge Giblmerge Felmergebl.),.),bl Gimergeblbound.),blatorembl Giboundblbl EDblbl松blblvoblbl fotografblbl SOblblientblblíkblbl LagblblveblblBlblbleltblbl Georgesblbl $_blbl Thomblblmentblbl Eddblbl'_blbl OffblblDispatchblblineblbl Primblbl dispatchmergemergeshipatomergemerge Felbl.),rembl Womenbl.), Blbl Womenremblshipboundblatteremblato Blmergeato GiblshipIKmergeblatmergebl Felmerge

Instead the 128-latest version, on top of producing the max amount of tokens set in the API, throws an error: IndexError: piece id is out of range.

If I remove the "old" quant-cuda, and replace it with the one by iwalton3 (https://github.com/iwalton3/GPTQ-for-LLaMa), I can get some sensible output (e.g. user: hello! assistant: 😊 Hello there! How may I assist you today?), but I get a lot of responses which consist of just the \u200b zero-width space character (e.g. user: write a haiku about the Python programming language assistant: \u200b user: please assistant: 🌱Python grows In code it blooms and flows A garden of logic) -- it really likes emojis apparently.

If I try to run the checkpoint in the main branch, it gives the same size mismatch errors as before.

If I remove the gptq-llama package and install the "new" quant-cuda, the model from the main branch still crashes with the same error, and the 128-latest version behaves as above.

My preprompt (partially stolen from MPT-7B-Chat) is:

USERTOKEN = "<|prompter|> "
ASSISTANTTOKEN = "<|assistant|>: "
ENDTOKEN = "\n"
start_message = f"{USERTOKEN}You are a helpful chatbot. You answer questions. You are excited to be able to help the user and you will help them any way you can. You are more than an information source, you are also able to code, write poetry, write short stories, summarize text, and make jokes. You always reply after `{ASSISTANTTOKEN}`. Say okay if you understand.\n{ASSISTANTTOKEN}okay.{ENDTOKEN}"

By the way, thanks a lot for helping me. It would be extremely cool to run OpenAssistant locally!

e-caste

May 16, 2023

@TheBloke thanks a lot! That worked! 🎉

pevogam

May 16, 2023

Hi @TheBloke unfortunately this did not change anything for me running on a docker container like so:

pevogam@prime:/mnt/local/notebooks/text-generation-webui> head .env 
# by default the Dockerfile specifies these versions: 3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX
# however for me to work i had to specify the exact version for my card ( 2060 ) it was 7.5
# https://developer.nvidia.com/cuda-gpus you can find the version for your card here
TORCH_CUDA_ARCH_LIST=7.5

# these commands worked for me with roughly 4.5GB of vram
CLI_ARGS=--model TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ --api --listen

pevogam@prime:/mnt/local/notebooks/text-generation-webui> cat models/config-user.yaml 
TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ$:
  auto_devices: false
  bf16: false
  cpu: false
  cpu_memory: 0
  disk: false
  gpu_memory_0: 0
  groupsize: None
  load_in_8bit: false
  mlock: false
  model_type: llama
  n_batch: 512
  n_gpu_layers: 0
  pre_layer: 0
  threads: 0
  wbits: '4'
pevogam@prime:/mnt/local/notebooks/text-generation-webui> docker compose up --build
[+] Building 1.3s (37/37) FINISHED                                                                                                                                                                    
 => [internal] load build definition from Dockerfile                                                                                                                                             0.1s
 => => transferring dockerfile: 115B                                                                                                                                                             0.0s
 => [internal] load .dockerignore                                                                                                                                                                0.0s
 => => transferring context: 123B                                                                                                                                                                0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04                                                                                                                0.9s
 => [internal] load metadata for docker.io/nvidia/cuda:11.8.0-devel-ubuntu22.04                                                                                                                  0.0s
 => [stage-1  1/24] FROM docker.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04@sha256:9b9ce0e128463d147a58b5013255c60e7eb725141f37c197b1ddee5aeb7e4161                                                0.0s
 => [internal] load build context                                                                                                                                                                0.0s
 => => transferring context: 5.00kB                                                                                                                                                              0.0s
 => [builder 1/7] FROM docker.io/nvidia/cuda:11.8.0-devel-ubuntu22.04                                                                                                                            0.0s
 => CACHED [stage-1  2/24] RUN apt-get update &&     apt-get install --no-install-recommends -y libportaudio2 libasound-dev git python3 python3-pip make g++ &&     rm -rf /var/lib/apt/lists/*  0.0s
 => CACHED [stage-1  3/24] RUN --mount=type=cache,target=/root/.cache/pip pip3 install virtualenv                                                                                                0.0s
 => CACHED [stage-1  4/24] RUN mkdir /app                                                                                                                                                        0.0s
 => CACHED [stage-1  5/24] WORKDIR /app                                                                                                                                                          0.0s
 => CACHED [stage-1  6/24] RUN test -n "HEAD" && git reset --hard HEAD || echo "Using provided webui source"                                                                                     0.0s
 => CACHED [stage-1  7/24] RUN virtualenv /app/venv                                                                                                                                              0.0s
 => CACHED [stage-1  8/24] RUN . /app/venv/bin/activate &&     pip3 install --upgrade pip setuptools &&     pip3 install torch torchvision torchaudio                                            0.0s
 => CACHED [builder 2/7] RUN apt-get update &&     apt-get install --no-install-recommends -y git vim build-essential python3-dev python3-venv &&     rm -rf /var/lib/apt/lists/*                0.0s
 => CACHED [builder 3/7] RUN git clone https://github.com/oobabooga/GPTQ-for-LLaMa /build                                                                                                        0.0s
 => CACHED [builder 4/7] WORKDIR /build                                                                                                                                                          0.0s
 => CACHED [builder 5/7] RUN python3 -m venv /build/venv                                                                                                                                         0.0s
 => CACHED [builder 6/7] RUN . /build/venv/bin/activate &&     pip3 install --upgrade pip setuptools &&     pip3 install torch torchvision torchaudio &&     pip3 install -r requirements.txt    0.0s
 => CACHED [builder 7/7] RUN . /build/venv/bin/activate &&     python3 setup_cuda.py bdist_wheel -d .                                                                                            0.0s
 => CACHED [stage-1  9/24] COPY --from=builder /build /app/repositories/GPTQ-for-LLaMa                                                                                                           0.0s
 => CACHED [stage-1 10/24] RUN . /app/venv/bin/activate &&     pip3 install /app/repositories/GPTQ-for-LLaMa/*.whl                                                                               0.0s
 => CACHED [stage-1 11/24] COPY extensions/api/requirements.txt /app/extensions/api/requirements.txt                                                                                             0.0s
 => CACHED [stage-1 12/24] COPY extensions/elevenlabs_tts/requirements.txt /app/extensions/elevenlabs_tts/requirements.txt                                                                       0.0s
 => CACHED [stage-1 13/24] COPY extensions/google_translate/requirements.txt /app/extensions/google_translate/requirements.txt                                                                   0.0s
 => CACHED [stage-1 14/24] COPY extensions/silero_tts/requirements.txt /app/extensions/silero_tts/requirements.txt                                                                               0.0s
 => CACHED [stage-1 15/24] COPY extensions/whisper_stt/requirements.txt /app/extensions/whisper_stt/requirements.txt                                                                             0.0s
 => CACHED [stage-1 16/24] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/api && pip3 install -r requirements.txt                                      0.0s
 => CACHED [stage-1 17/24] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/elevenlabs_tts && pip3 install -r requirements.txt                           0.0s
 => CACHED [stage-1 18/24] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/google_translate && pip3 install -r requirements.txt                         0.0s
 => CACHED [stage-1 19/24] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/silero_tts && pip3 install -r requirements.txt                               0.0s
 => CACHED [stage-1 20/24] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/whisper_stt && pip3 install -r requirements.txt                              0.0s
 => CACHED [stage-1 21/24] COPY requirements.txt /app/requirements.txt                                                                                                                           0.0s
 => CACHED [stage-1 22/24] RUN . /app/venv/bin/activate &&     pip3 install -r requirements.txt                                                                                                  0.0s
 => CACHED [stage-1 23/24] RUN cp /app/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so /app/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so      0.0s
 => CACHED [stage-1 24/24] COPY . /app/                                                                                                                                                          0.0s
 => exporting to image                                                                                                                                                                           0.1s
 => => exporting layers                                                                                                                                                                          0.0s
 => => writing image sha256:cebe69eb2dd3d8126d46913e0c05e48da6b2a152d4bd2e77382fdfdeaf3994e2                                                                                                     0.0s
 => => naming to docker.io/library/text-generation-webui-text-generation-webui                                                                                                                   0.0s
[+] Running 1/1
 ✔ Container text-generation-webui-text-generation-webui-1  Recreated                                                                                                                            0.1s 
Attaching to text-generation-webui-text-generation-webui-1
text-generation-webui-text-generation-webui-1  | 
text-generation-webui-text-generation-webui-1  | ==========
text-generation-webui-text-generation-webui-1  | == CUDA ==
text-generation-webui-text-generation-webui-1  | ==========
text-generation-webui-text-generation-webui-1  | 
text-generation-webui-text-generation-webui-1  | CUDA Version 11.8.0
text-generation-webui-text-generation-webui-1  | 
text-generation-webui-text-generation-webui-1  | Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
text-generation-webui-text-generation-webui-1  | 
text-generation-webui-text-generation-webui-1  | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
text-generation-webui-text-generation-webui-1  | By pulling and using the container, you accept the terms and conditions of this license:
text-generation-webui-text-generation-webui-1  | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
text-generation-webui-text-generation-webui-1  | 
text-generation-webui-text-generation-webui-1  | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
text-generation-webui-text-generation-webui-1  | 
text-generation-webui-text-generation-webui-1  | Gradio HTTP request redirected to localhost :)
text-generation-webui-text-generation-webui-1  | bin /app/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
text-generation-webui-text-generation-webui-1  | Loading TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ...
text-generation-webui-text-generation-webui-1  | Found the following quantized model: models/TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ/OpenAssistant-30B-epoch7-GPTQ-4bit-1024g.compat.no-act-order.safetensors
text-generation-webui-text-generation-webui-1  | Loading model ...
text-generation-webui-text-generation-webui-1  | Traceback (most recent call last):
text-generation-webui-text-generation-webui-1  |   File "/app/server.py", line 918, in <module>
text-generation-webui-text-generation-webui-1  |     shared.model, shared.tokenizer = load_model(shared.model_name)
text-generation-webui-text-generation-webui-1  |   File "/app/modules/models.py", line 150, in load_model
text-generation-webui-text-generation-webui-1  |     model = load_quantized(model_name)
text-generation-webui-text-generation-webui-1  |   File "/app/modules/GPTQ_loader.py", line 176, in load_quantized
text-generation-webui-text-generation-webui-1  |     model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
text-generation-webui-text-generation-webui-1  |   File "/app/modules/GPTQ_loader.py", line 77, in _load_quant
text-generation-webui-text-generation-webui-1  |     model.load_state_dict(safe_load(checkpoint), strict=False)
text-generation-webui-text-generation-webui-1  |   File "/app/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
text-generation-webui-text-generation-webui-1  |     raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
text-generation-webui-text-generation-webui-1  | RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
text-generation-webui-text-generation-webui-1  | 	size mismatch for model.layers.0.self_attn.k_proj.qzeros: copying a param with shape torch.Size([7, 832]) from checkpoint, the shape in current model is torch.Size([1, 832]).
text-generation-webui-text-generation-webui-1  | 	size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([7, 6656]) from checkpoint, the shape in current model is torch.Size([1, 6656]).
text-generation-webui-text-generation-webui-1  | 	size mismatch for model.layers.0.self_attn.o_proj.qzeros: copying a param with shape torch.Size([7, 832]) from checkpoint, the shape in current model is torch.Size([1, 832]).
text-generation-webui-text-generation-webui-1  | 	size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([7, 6656]) from checkpoint, the shape in current model is torch.Size([1, 6656]).
text-generation-webui-text-generation-webui-1  | 	size mismatch for model.layers.0.self_attn.q_proj.qzeros: copying a param with shape torch.Size([7, 832]) from checkpoint, the shape in current model is torch.Size([1, 832]).
text-generation-webui-text-generation-webui-1  | 	size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([7, 6656]) from checkpoint, the shape in current model is torch.Size([1, 6656]).
text-generation-webui-text-generation-webui-1  | 	size mismatch for model.layers.0.self_attn.v_proj.qzeros: copying a param with shape torch.Size([7, 832]) from checkpoint, the shape in current model is torch.Size([1, 832]).
text-generation-webui-text-generation-webui-1  | 	size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([7, 6656]) from checkpoint, the shape in current model is torch.Size([1, 6656]).
text-generation-webui-text-generation-webui-1  | 	size mismatch for model.layers.0.mlp.down_proj.qzeros: copying a param with shape torch.Size([18, 832]) from checkpoint, the shape in current model is torch.Size([1, 832]).
text-generation-webui-text-generation-webui-1  | 	size mismatch for model.layers.0.mlp.down_proj.scales: copying a param with shape torch.Size([18, 6656]) from checkpoint, the shape in current model is torch.Size([1, 6656]).

Could it be that the user config file does not take effect due to some reason?

TheBloke

Owner May 16, 2023

You're running the 1024g model, so either change to using the model in main, or else change:

 groupsize: None

to:

 groupsize: 1024

in the .yaml file.

pevogam

May 16, 2023

Ah good catch, thanks a lot! Problem is indeed fixed.

pevogam changed discussion status to closed May 16, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment