metadata

library_name: transformers
license: mit
pipeline_tag: image-to-text

Blip Image Captioning Base BF16

This model is a quantized version of the Salesforce/blip-image-captioning-base, an image-to-text model. From a memory footprint of 989 MBs -> 494 MBs by quantizing the percision of float32 to bfloat 16, reducing the model's memory size by 50 percent.

Example


a cat sitting on top of a purple and red striped carpet

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import BlipForConditionalGeneration, BlipProcessor
import requests
from PIL import Image

model = BlipForConditionalGeneration.from_pretrained("gospacedev/blip-image-captioning-base-bf16")
processor = BlipProcessor.from_pretrained("gospacedev/blip-image-captioning-base-bf16")

# Load sample image
image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

# Generate output
inputs = processor(image, return_tensors="pt")
output = model.generate(**inputs)
result = processor.decode(out[0], skip_special_tokens=True)

print(results)

Model Details

Developed by: Grantley Cullar
Model type: Image-to-Text
Language(s) (NLP): English
License: MIT License