ValueError: The input provided to the model are wrong. The number of image tokens is 0 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.

#2
by barleyspectacular - opened

Error in the example

I got same error.

Same error here

Llava Hugging Face org

Thanks for reporting, looking into this. It has to do with a discrepancy between the slow/fast tokenizer.

A current workaround is using this:

from transformers import LlavaNextProcessor

processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-34b-hf", use_fast=False)
Llava Hugging Face org
deleted

there is a question: Is llava-hf/llava-v1.6-34b-hf can be loaded with a single V100-32G GPU?

Llava Hugging Face org
β€’
edited Mar 21, 2024

Hi, so with 4-bit quantization, this model requires 34/2 = 17GB of RAM. So yes that should work.

there is a question: Is llava-hf/llava-v1.6-34b-hf can be loaded with a single V100-32G GPU?

Your mileage may vary, but in 4-bit quantization with Flash Attention 2, I was just able to run this model on my 24G 3090 Ti. I even had to reduce Torch's CUDA split size to squeeze every last bit of optimization I could out of it otherwise I was hitting OOM's.

Llava Hugging Face org

This issue is fixed now (kudos to @ArthurZ )! Have updated the code snippet of the model card.

nielsr changed discussion status to closed

NOT fixed.

Sign up or log in to comment