gospacedev
/

blip-image-captioning-base-bf16

image-text-to-text

Inference Endpoints

Model card Files Files and versions Community

blip-image-captioning-base-bf16 / README.md

gospacedev's picture

Update README.md

e73727e verified 7 months ago

|

history blame contribute delete

1.47 kB

	---
	library_name: transformers
	license: mit
	pipeline_tag: image-to-text
	---

	# Blip Image Captioning Base BF16

	This model is a quantized version of the [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base), an image-to-text model.
	From a memory footprint of 989 MBs -> 494 MBs by quantizing the percision of float32 to bfloat 16, reducing the model's memory size by 50 percent.

	## Example

	\| <img src="https://huggingface.co/gospacedev/blip-image-captioning-base-bf16/resolve/main/cat%20in%20currents.png" width="316" height="316"> \|
	\|---\|
	\| a cat sitting on top of a purple and red striped carpet \|

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import BlipForConditionalGeneration, BlipProcessor
	import requests
	from PIL import Image

	model = BlipForConditionalGeneration.from_pretrained("gospacedev/blip-image-captioning-base-bf16")
	processor = BlipProcessor.from_pretrained("gospacedev/blip-image-captioning-base-bf16")

	# Load sample image
	image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

	# Generate output
	inputs = processor(image, return_tensors="pt")
	output = model.generate(**inputs)
	result = processor.decode(out[0], skip_special_tokens=True)

	print(results)
	```

	## Model Details

	- Developed by: Grantley Cullar
	- Model type: Image-to-Text
	- Language(s) (NLP): English
	- License: MIT License