Image-to-Text
Transformers
Safetensors
English
llava
image-text-to-text

The model supports multi-image and multi-prompt generation.?

#1
by sujitvasanth - opened

I tried this and I am having problems getting both images into the model
here is my code - the are no errors but it seems only 1 image is described

import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
import transformers
print(transformers.__version__)

model_id = "/home/sujit/Downloads/text-generation-webui-main/models/YouLiXiya_tinyllava-v1.0-1.1b-hf"
#model_id = "/home/sujit/Downloads/text-generation-webui-main/models/bczhou_tiny-llava-v1-hf" 

question = "<image><image>Describe each image. \nAssistant:"
image1 = "/home/sujit/Pictures/Barnaby and Bella.jpg"
image2 = "/home/sujit/Pictures/Daisy puppy.jpg"

model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=False,
).to(0)

processor = AutoProcessor.from_pretrained(model_id)

raw_image1 = Image.open(image1)
raw_image2 = Image.open(image2)
inputs = processor(text=question, images=(raw_image1, raw_image2), return_tensors='pt').to(0, torch.float16)

output = model.generate(**inputs, max_new_tokens=100, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))

my output:
4.36.2
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Describe each image.
Assistant: 1. A large brown dog with a blue collar is laying on a green blanket. 2. A smaller brown dog is laying on a green blanket.

The training is based on a single image. Multiple images are not supported

@YouLiXiya then please update your model card which states

How to use the model
First, make sure to have transformers >= 4.35.3. The model supports multi-image and multi-prompt generation. Meaning that you can pass multiple images in your prompt. Make sure also to follow the correct prompt template (USER: xxx\nASSISTANT:) and add the token image> to the location where you want to query images:

Sign up or log in to comment