File size: 1,467 Bytes
18054a8
 
059c821
 
18054a8
 
c9a10a8
18054a8
b2c6f03
c8300d3
18054a8
a271d68
e73727e
 
 
 
a271d68
18054a8
 
 
 
b2c6f03
 
 
 
18054a8
b2c6f03
 
18054a8
b2c6f03
 
18054a8
b2c6f03
 
 
 
18054a8
b2c6f03
 
18054a8
b2c6f03
18054a8
b2c6f03
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
library_name: transformers
license: mit
pipeline_tag: image-to-text
---

# Blip Image Captioning Base BF16

This model is a quantized version of the [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base), an image-to-text model.
From a memory footprint of 989 MBs -> 494 MBs by quantizing the percision of float32 to bfloat 16, reducing the model's memory size by 50 percent.

## Example

| <img src="https://huggingface.co/gospacedev/blip-image-captioning-base-bf16/resolve/main/cat%20in%20currents.png" width="316" height="316"> |
|---|
| a cat sitting on top of a purple and red striped carpet |

## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import BlipForConditionalGeneration, BlipProcessor
import requests
from PIL import Image

model = BlipForConditionalGeneration.from_pretrained("gospacedev/blip-image-captioning-base-bf16")
processor = BlipProcessor.from_pretrained("gospacedev/blip-image-captioning-base-bf16")

# Load sample image
image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

# Generate output
inputs = processor(image, return_tensors="pt")
output = model.generate(**inputs)
result = processor.decode(out[0], skip_special_tokens=True)

print(results)
```

## Model Details

- **Developed by:** Grantley Cullar
- **Model type:** Image-to-Text
- **Language(s) (NLP):**  English
- **License:** MIT License