HV-Khurdula
/

Dua-Vision-Base

Image-Text-to-Text

vision-encoder-decoder

VisionTransformer

Inference Endpoints

Model card Files Files and versions Community

HV-Khurdula commited on Oct 29, 2024

Commit

d84472f

·

verified ·

1 Parent(s): e50c6a8

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -16,6 +16,9 @@ tags:
 # Dua-Vision-Base
 A Vision Encoder-Decoder model that doesn’t just caption images but generates questions and possible answers based on what it “sees.” Using ViT as the encoder and BART as the decoder, it’s built for image-based QA without the fluff.
 Translation: feed it an image, and get back a useful question-answer pair. Perfect for creating and synthesizing data in image QA tasks. It’s one model, two tasks, and a lot of potential!

 # Dua-Vision-Base
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f0cf1adcac1f99adbabb56/FZOLSnkBj_xPbaNQBqbU5.png)
 A Vision Encoder-Decoder model that doesn’t just caption images but generates questions and possible answers based on what it “sees.” Using ViT as the encoder and BART as the decoder, it’s built for image-based QA without the fluff.
 Translation: feed it an image, and get back a useful question-answer pair. Perfect for creating and synthesizing data in image QA tasks. It’s one model, two tasks, and a lot of potential!