size mismatch for model.layers

#5
by PandaEspresso - opened

When I load the model, it shows size mismatch errors for all model layers from layer 0 to layer 59.
Any clue what could be the reason for that? I can load your 13B Wizard model fine.

2023-05-30 22:20:29 | ERROR | stderr | size mismatch for model.layers.0.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
2023-05-30 22:20:29 | ERROR | stderr | size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
2023-05-30 22:20:29 | ERROR | stderr | size mismatch for model.layers.0.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
2023-05-30 22:20:29 | ERROR | stderr | size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
2023-05-30 22:20:29 | ERROR | stderr | size mismatch for model.layers.0.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
2023-05-30 22:20:29 | ERROR | stderr | size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
2023-05-30 22:20:29 | ERROR | stderr | size mismatch for model.layers.0.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
2023-05-30 22:20:29 | ERROR | stderr | size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
2023-05-30 22:20:29 | ERROR | stderr | size mismatch for model.layers.0.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]).
2023-05-30 22:20:29 | ERROR | stderr | size mismatch for model.layers.0.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
2023-05-30 22:20:29 | ERROR | stderr | size mismatch for model.layers.0.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
2023-05-30 22:20:29 | ERROR | stderr | size mismatch for model.layers.0.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
2023-05-30 22:20:29 | ERROR | stderr | size mismatch for model.layers.0.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).

I got a similar message when I tried to start the model with group size 128. With no group size/group size None it starts fine for me.

I think that's also one difference between this model and the 13B one

Yeah group_size should be None. This error happens when it's set to 128 by mistake

Please see instructions in the README regarding setting the correct params. And if you're still having trouble after that, please try updating text-generation-webui. There was a bug recently where it would overwrite "groupsize = None" with "groupsize = 128", but I believe that's been fixed now.

Thank you guys, you are right, I found this line

def load_quantized(model_name, wbits=4, groupsize=128, threshold=128):

changing the group size default to -1 solves the problem.

Oh sorry didn't notice you were using custom code. In which case you should check out AutoGPTQ. It makes it much easier to handle loading GPTQ models. And you don't need to set params manually in your code because they get loaded from quantize_config.json

Example AutoGPTQ code:

First download and build AutoGPTQ from source:

git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
pip install .

Then:

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM
import argparse

# path to directory containing local model
quantized_model_dir = "/workspace/models/TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ"

model_basename = "Wizard-vicuna-30B-Uncensored-GPTQ-4bit.act-order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir,
        use_safetensors=True,
        model_basename=model_basename,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

prompt = "Tell me about AI"
prompt_template=f'''### Human: {prompt}
### Assistant:'''

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

Sign up or log in to comment