lokibots/vit-patch16-1280-gpt2-large-image-summary
This model generates a summary from a given chart image. The model accepts an image of size 1280x768 (or less) and generates a summary describing the contents of the image. However, training is still required.
sample inference code
from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, GPT2Tokenizer
from PIL import Image
model = VisionEncoderDecoderModel.from_pretrained("lokibots/vit-patch16-1280-gpt2-large-image-summary")
feature_extractor = ViTFeatureExtractor.from_pretrained("lokibots/vit-patch16-1280-gpt2-large-image-summary")
tokenizer = GPT2Tokenizer.from_pretrained('gpt2-large')
image = Image.open("image_file").convert("RGB")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
gen_kwargs = {"max_length": 1024, "num_beams": 4}
output_ids = model.generate(pixel_values, **gen_kwargs)
preds = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
- Downloads last month
- 49
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.