ValueError: The input provided to the model are wrong. The number of image tokens is 1 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.

#8
by bghira - opened

error with example code.

ValueError: The input provided to the model are wrong. The number of image tokens is 1 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.

Check your prompt template
스크린샷 2024-04-17 오후 3.32.06.png

i'm using the exact demo code from the model card

I also used the demo code as is and received the above error message when I entered the wrong prompt format.

i don't understand what you're trying to say.

i used the model code from the 1.6-34b card, which we are on the community page for.

it has the system prompt built in.

are you in the right place?

Yes, I used the demo code as is and it worked fine, but I modified the prompt incorrectly and the error above occurred.

i think your issue was different. i have not modified anything. i simply copy/paste the code and execute it, and I receive the error.

i think your issue was different. i have not modified anything. i simply copy/paste the code and execute it, and I receive the error.

same problem!

and i'm using the Git version of Transformers. no difference between release version or Git main.

i almost don't believe that @keunseop got the 34b model even running. are you sure you didn't switch it to Vicuna or something?

This comment has been hidden

Seems like '< image >: 64000' is not in the input_ids encoded by the processor with the demo prompt
image.png

This comment has been hidden

To compare inference speeds,I ran both the mistral 7b model and the 34b model on four v100 GPUs.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
transformers 4.40.0.dev0 requires tokenizers<0.19,>=0.14, but you have tokenizers 0.19.0 which is incompatible.

even trying to install the latest Tokenizers library (i was running 15.2) doesn't work with the latest Transformers main branch.

what a wild thing to observe, considering both projects are from the same team and rely on each other so heavily

INFO:root:Processing image: anime-summerghost-54.png, data: <PIL.PngImagePlugin.PngImageFile image mode=RGB size=1920x1080 at 0x16FC842E0>
INFO:root:Using LLaVA 1.6+ model.
INFO:root:Inputs: {'input_ids': tensor([[59603,  9334,  1397,   562, 13310,  2756,   597,   663, 15874, 10357,
         14135,    98,   707, 14135,  3641,  6901,    97,  7283,    97,   597,
         31081,  8476,   592,   567,  2756, 59610, 59575,  3275,    98,  2134,
          1471, 59601, 59568, 64000,   144,  5697,   620,  2709,   594,   719,
          2728,   100, 39965,  8898,  9129, 59601]], device='mps:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
       device='mps:0'), 'pixel_values': tensor([[[[[ 1.3464,  1.3464,  1.3464,  ...,  0.0325,  0.1201,  0.1493],

mine has 64000 in there but it still doesn't work, even though i switched the processor config to use_fast=False

@keunseop so again i wonder how you got this working when it has never worked

Llava Hugging Face org

Will investigate, thanks for reporting

from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration, BitsAndBytesConfig
import torch
from PIL import Image
import requests


quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-34b-hf")

model = LlavaNextForConditionalGeneration.from_pretrained("llava-hf/llava-v1.6-34b-hf", quantization_config=quantization_config, device_map="auto") 
#model.to("cuda:0")

# prepare image and text prompt, using the appropriate prompt template
url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "<|im_start|>system\nAnswer the questions.<|im_end|><|im_start|>user\n<image>\nWhat is shown in this image?<|im_end|><|im_start|>assistant\n"

inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")

# autoregressively complete prompt
output = model.generate(**inputs, max_new_tokens=100)

print(processor.decode(output[0], skip_special_tokens=True))

@ptx0

Added quantization code for inference on multiple GPUs.

스크린샷 2024-04-18 오전 11.15.38.png

i meet the same problem

Llava Hugging Face org

Should be solved in the latest version of transformers. Can you confirm you still observe the bug after updating?

Also, I believe there was a similar problem when to using "mps" device, see https://github.com/huggingface/transformers/issues/30294 for details

same problem, upgrade transformers to 4.42.3 does not solve the issue

i found that the token_index of <image> in the added_tokens.json is 64003, but default image_token_index is 64000.
so i add one line to the demo code, then i worked.

inputs['input_ids'][inputs['input_ids'] == 64003] = 64000

Llava Hugging Face org

Hi yes this is being discussed here: https://github.com/huggingface/transformers/issues/31713

Llava Hugging Face org

Rolled back the commits to make sure it works. The updates were related to adding the chat template, which @RaushanTurganbay will take care off when she's back

I'm using "llava-hf/llava-v1.6-mistral-7b-hf" and just got rid of this error on my code. Double-check that you're always using LlavaNext where possible. I was using LlavaForConditionalGeneration instead of LlavaNextForConditionalGeneration.

@bghira You need to use the correct chat template I guess. It works for me.

Sign up or log in to comment