Bug in 4bit quantization?
I am following the model's tutorial from the model card, with minor modifications such as device_map='auto' and os.environ['CUDA_VISIBLE_DEVICES'] = '2,3,4,5,6,7'. I am running it inside jupyter notebook. It works well when I use 8 bit quantization, but the model answers me "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" when I use 4 bit quantization. The problem disappears when I substitute load_in_4bit to load_in_8bit (but then it is too heavy to run even in 7 Titan RTX 24GB GPUs, with somewhat large images). Here is the code and output:
import os
os.environ['TRANSFORMERS_CACHE'] = './HFCache'
os.environ['HF_HOME'] = './HFCache'
os.environ['CUDA_VISIBLE_DEVICES'] = '2,3,4,5,6,7'
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaOnevisionForConditionalGeneration
model_id = "llava-hf/llava-onevision-qwen2-72b-ov-hf"
model = LlavaOnevisionForConditionalGeneration.from_pretrained(
processor = AutoProcessor.from_pretrained(model_id)
# Define a chat history and use `apply_chat_template` to get correctly formatted prompt
# Each value in "content" has to be a list of dicts with types ("text", "image")
conversation = [
"role": "user",
"content": [
{"type": "text", "text": "What are these?"},
{"type": "image"},
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(images=raw_image, text=prompt, return_tensors='pt').to(torch.float16)#.to(0, torch.float16)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))
What are these?assistant
Output when using load_in_8bit (correct):
What are these?assistant
These are two cats lying on a pink blanket.
Environment details
hmm, I tried the same code with 4-bit and got These are two cats lying on a pink blanket.
as reply. This can also be hardware related probably, as I have the same versions as you have