Model Card for Model ID

Lora for Blip2 to generate QAs from a picture.

Infertece Demo

from datasets import load_dataset 
from peft import PeftModel
import torch
from transformers import AutoProcessor, Blip2ForConditionalGeneration

# prepare the model
processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained("ybelkada/blip2-opt-2.7b-fp16-sharded", device_map="auto", load_in_8bit=True)
model = PeftModel.from_pretrained(model, "curlyfu/blip2-OCR-QA-generation")

# prepare inputs
dataset = load_dataset("howard-hou/OCR-VQA", split="test")
example = dataset[10]
image = example["image"]

inputs = processor(images=image, return_tensors="pt").to("cuda", torch.float16)
pixel_values = inputs.pixel_values

generated_ids = model.generate(pixel_values=pixel_values, max_length=100)
generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_caption)

Thanks

huggingface/notebooks

Downloads last month
120
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support image-to-text models for peft library.

Model tree for curlyfu/blip2-QA-generation

Adapter
(35)
this model