BrainBLIP
This model is not ready for production use and is in preliminary stages of training. Use at your own risks
Model Description
BrainBLIP is finetuned to give more natural captions for training text-to-image datasets with an emphasis on natural language while adding a minimal amount of tags for context.
How to Get Started with the Model
from transformers import AutoProcessor, BlipForConditionalGeneration
from PIL import Image
processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("braintacles/brainblip").to("cuda")
image_path_or_url = r"https://imagePath_or_url.jpg"
raw_image = Image.open(requests.get(image_path_or_url, stream=True).raw) if image_path_or_url.startswith("http") else Image.open(image_path_or_url)
inputs = processor(raw_image, return_tensors="pt").to("cuda")
out = model.generate(**inputs, min_length=40, max_new_tokens=75, num_beams=5, repetition_penalty=1.40)
caption = processor.decode(out[0], skip_special_tokens=True)
print(caption)
Training Details
Training Data
All captions for this data have been written by myself by hand with some occasional help from GPT4. Very special thanks to the following people who also have contributed a huge amount of time hand captioning some data:
- Downloads last month
- 48
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.