LSX-UniWue/Betzerl_1B_wiki_preview · Error when loading adapter

25 days ago

Servus!

First and foremost thanks for your efforts in developing competitive open German LLMs!
I've tried using your adapter with the default code provided in the transformers documentation.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("LSX-UniWue/LLaMmlein_1B")
tokenizer = AutoTokenizer.from_pretrained("LSX-UniWue/LLaMmlein_1B")
model.load_adapter("LSX-UniWue/Betzerl_1B_wiki_preview")

as well as

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("LSX-UniWue/Betzerl_1B_wiki_preview")
tokenizer = AutoTokenizer.from_pretrained("LSX-UniWue/LLaMmlein_1B")

Both approaches fail with

...
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
    size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([32001, 2048]) from checkpoint, the shape in current model is torch.Size([32000, 2048]).
    size mismatch for lm_head.weight: copying a param with shape torch.Size([32001, 2048]) from checkpoint, the shape in current model is torch.Size([32000, 2048]).

Apart from that I also noticed the referenced "base_model_name_or_path": "LSX-UniWue/llammchen_1b" inside adapter_config.json points to https://huggingface.co/LSX-UniWue/LLaMmlein_1B.

So I guess my question is: Is https://huggingface.co/LSX-UniWue/LLaMmlein_1B the right base model?

JanPf

Data Science@CAIDAS Uni Würzburg org 25 days ago

Servus!

yeah, I actually forgot to add the script here. It's the same discussion as this: https://huggingface.co/LSX-UniWue/LLaMmlein_1B/discussions/1#673e24594dca9bce313774b8
I just added the code to this repo as well, hope it works ✌🏻

Best,
jan

P.S: https://huggingface.co/LSX-UniWue/llammchen_1b just redirects to https://huggingface.co/LSX-UniWue/LLaMmlein_1B :)

philipp-zettl

25 days ago

Thanks for the quick response and thanks for the pointer to the other adapter :)

One additional nit: this adapter here requires embedding size of 32001, not 32064 like LSX-UniWue/LLaMmlein_1B_chat_selected and the code in the README suggests

JanPf changed discussion status to closed 25 days ago