How to use this weight file with pure `transformer` code?

#2
by Seungyoun - opened

I manually fix the issue so that it can be loaded to huggingface transformers model
@csegalin

https://huggingface.co/Seungyoun/llava-llama-3-8b-hf

It would be great if you provide chat template you used to train the model also thanks you for your wonderful work @LZHgrla

xtuner org

Hi @Seungyoun
This model follows the format of official llava-v1.5/v1.6. and is not in the format of LlavaForConditionalGeneration.

We will provide a conversion script in about 1-2 day's, to convert this model to LlavaForConditionalGeneration model.
Before that, please use the cli or lmdeploy, as https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf#quickstart

Thank you for your prompt response. @LZHgrla

This model follows the format of official llava-v1.5/v1.6. and is not in the format of LlavaForConditionalGeneration.

We will provide a conversion script in about 1-2 day's, to convert this model to LlavaForConditionalGeneration model.

I need to clarify something: in your message, you mention that this model follows the format of the official LLaVA-v1.5/v1.6 and is not directly in the LlavaForConditionalGeneration format. However, my understanding is that LLaVA-v1.5/v1.6 corresponds to what is known as LlavaNextForConditionalGeneration. Could you confirm if this is the case?

Seungyoun changed discussion title from How to use this weight file with pure transformer code? to How to use this weight file with pure `transformer` code?
xtuner org

Good question! There are so many formats for llama model.
Here are two examples :
https://huggingface.co/liuhaotian/llava-v1.5-7b/tree/main is llava format
https://huggingface.co/llava-hf/llava-1.5-7b-hf/tree/main is hf format

This model is in llava format, although it has a -hf suffix.

@LZHgrla
I am trying to manually fixing the model weight index mapping to proper structual and config.json

Is your model also followes this added_tokens.json ?

{
  "<image>": 32000,
  "<pad>": 32001
}

@Seungyoun

The original vocab size is 128256

So, I think the correct token ids should be

{
  "<image>": 128257,
  "<pad>": 128258
}

were you able to make it work?
using the cli version I get issue with the transformer 4.37 required by llava while all these new models work with at least 4.39

I manually fix the issue so that it can be loaded to huggingface transformers model
@csegalin

https://huggingface.co/Seungyoun/llava-llama-3-8b-hf

It would be great if you provide chat template you used to train the model also thanks you for your wonderful work @LZHgrla

I added an issue in your repo.

@LZHgrla any update on the conversion script?

xtuner org

@csegalin
We will release pure transformers and gguf version models and the corresponding conversion scripts within a few days.

Before that, welcome to try our newly released llava-phi-3-mini model, which has multiple format supports, including the official llava format, pure transformers format and gguf format.
https://huggingface.co/xtuner/llava-phi-3-mini-hf

I manually fix the issue so that it can be loaded to huggingface transformers model
@csegalin

https://huggingface.co/Seungyoun/llava-llama-3-8b-hf

It would be great if you provide chat template you used to train the model also thanks you for your wonderful work @LZHgrla

Thanks! We use the llama-3's chat-template to train llava-llama-3-8b models.

That is,

"chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}",

An image-text example is

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<image>What is it?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

AAAAAAAAA<|eot_id|>
<|start_header_id|>user<|end_header_id|>

Do you like it?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

Hei thanks
I tried yesterday and not sure why but performance is worse than the first version. A lot of repeated words, less accurate even if using same generation parameter

Sign up or log in to comment