Image-Text-to-Text
Transformers
Safetensors
English
idefics2
pretraining
multimodal
vision
Inference Endpoints
5 papers

How to modify the weights of the LLM section

#62
by cookey39 - opened

I have a fine-tuned version of Mistral-7B-Instruct who has an expanded word list. I would like to be able to use this fine-tuned version of Mistral-7B-Instruct instead of the original Mistral-7B-Instruct .
I tried loading it like this:

model = AutoModelForVision2Seq.from_pretrained("HuggingFaceM4/idefics2-8b",torch_dtype=torch.float16,device_map="auto",text_config = config)

but

1.png

cookey39 changed discussion title from How to separate SIGLIP and LLM, which can be replaced with a fine-tuned version of Mistral-7B-Instruct to How to modify the weights of the LLM section

it seems like you changed the vocab size (and thus the embedding matrix size). did you reflect that change in the config? in particular the vocab_size attribute

it seems like you changed the vocab size (and thus the embedding matrix size). did you reflect that change in the config? in particular the vocab_size attribute

Thanks for the reply, this is my config for text_mode:

  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-05,
  "rope_theta": 10000.0,
  "sliding_window": 4096,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.36.2",
  "use_cache": true,
  "vocab_size": 63872

Based on the error message, it seems that although I changed vocab_size=63872, transformers still loaded the model's original mistral model (vocab_size=32003)

Sign up or log in to comment