tensor 'token_embd.weight' has wrong shape

by rambocoder - opened Sep 21, 2023

Sep 21, 2023

When running ./main -ngl 32 -gqa 8 -m /tmp/llama2-70b-oasst-sft-v10.Q8_0.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant"

error occurs:
error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected 8192, 32007, got 8192, 32128, 1, 1
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/tmp/llama2-70b-oasst-sft-v10.Q8_0.gguf'
main: error: unable to load model

Any ideas how to get around this error?

rambocoder

Sep 21, 2023

The GGML version has these attributes:
llama_model_load_internal: n_vocab = 32007
llama_model_load_internal: n_embd = 8192

TheBloke

Owner Sep 21, 2023

Ah I know what this is. Andreas did something to the model to pad it to 128 tokens, but the result is the model is actually broken. I had the same issue when I made the AWQ the other day - I had to go back to an earlier commit on the source model, before he did that padding. See here: https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10/discussions/5

I'll have to re-make the GGUF as well I guess

rambocoder

Sep 21, 2023

This comment has been hidden

anjakuzev

Sep 22, 2023

•

edited Sep 22, 2023

How can i convert a finetuned model with LoRA to GGUF?

when i try with the convert.py i get this error:
Loading model file lora/adapter_model.bin
Traceback (most recent call last):
File "convert.py", line 1208, in
main()
File "convert.py", line 1157, in main
params = Params.load(model_plus)
File "convert.py", line 292, in load
params = Params.guessed(model_plus.model)
File "convert.py", line 166, in guessed
n_vocab, n_embd = model["model.embed_tokens.weight"].shape if "model.embed_tokens.weight" in model else model["tok_embeddings.weight"].shape
KeyError: 'tok_embeddings.weight

vmirea

Oct 25, 2023

•

edited Oct 25, 2023

How can i convert a finetuned model with LoRA to GGUF?

when i try with the convert.py i get this error:
Loading model file lora/adapter_model.bin
Traceback (most recent call last):
File "convert.py", line 1208, in
main()
File "convert.py", line 1157, in main
params = Params.load(model_plus)
File "convert.py", line 292, in load
params = Params.guessed(model_plus.model)
File "convert.py", line 166, in guessed
n_vocab, n_embd = model["model.embed_tokens.weight"].shape if "model.embed_tokens.weight" in model else model["tok_embeddings.weight"].shape
KeyError: 'tok_embeddings.weight

Have you succeeded in fixing this? I get the same error on convert.py

newsletter

Nov 13, 2023

It is still not usable, apparently: Using llama2-70b-oasst-sft-v10.Q5_K_M.gguf in oobabooga/text-generation-webui fails with the following error:
error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected 8192, 32007, got 8192, 32128, 1, 1

Is there any way to fix this?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment