The model supports multi-image and multi-prompt generation.?
I tried this and I am having problems getting both images into the model
here is my code - the are no errors but it seems only 1 image is described
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
import transformers
print(transformers.__version__)
model_id = "/home/sujit/Downloads/text-generation-webui-main/models/YouLiXiya_tinyllava-v1.0-1.1b-hf"
#model_id = "/home/sujit/Downloads/text-generation-webui-main/models/bczhou_tiny-llava-v1-hf"
question = "<image><image>Describe each image. \nAssistant:"
image1 = "/home/sujit/Pictures/Barnaby and Bella.jpg"
image2 = "/home/sujit/Pictures/Daisy puppy.jpg"
model = LlavaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16,
low_cpu_mem_usage=False,
).to(0)
processor = AutoProcessor.from_pretrained(model_id)
raw_image1 = Image.open(image1)
raw_image2 = Image.open(image2)
inputs = processor(text=question, images=(raw_image1, raw_image2), return_tensors='pt').to(0, torch.float16)
output = model.generate(**inputs, max_new_tokens=100, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))
my output:
4.36.2
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Describe each image.
Assistant: 1. A large brown dog with a blue collar is laying on a green blanket. 2. A smaller brown dog is laying on a green blanket.
The training is based on a single image. Multiple images are not supported
@YouLiXiya then please update your model card which states
How to use the model
First, make sure to have transformers >= 4.35.3. The model supports multi-image and multi-prompt generation. Meaning that you can pass multiple images in your prompt. Make sure also to follow the correct prompt template (USER: xxx\nASSISTANT:) and add the token image> to the location where you want to query images: