Visual Question Answering
Transformers
Safetensors
English
vlm
text-generation
image-captioning
Inference Endpoints
kimihailv commited on
Commit
560c6e4
1 Parent(s): c72df27

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -14,7 +14,7 @@ For Content Understanding and Generation<br/>
14
  UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model consists of two parts:
15
 
16
  1. [UForm Vision Encoder](https://huggingface.co/unum-cloud/uform-vl-english)
17
- 2. [Sheared-LLaMA-1.3B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B) manually tuned on the instruction dataset
18
 
19
  The model was pre-trained on: MSCOCO, SBU Captions, Visual Genome, VQAv2, GQA and a few internal datasets. UForm-Gen-Chat is SFT version of [`UForm-Gen`](https://huggingface.co/unum-cloud/uform-gen) for multimodal chat.
20
 
 
14
  UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model consists of two parts:
15
 
16
  1. [UForm Vision Encoder](https://huggingface.co/unum-cloud/uform-vl-english)
17
+ 2. [Sheared-LLaMA-1.3B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B) manually tuned on the instructions dataset
18
 
19
  The model was pre-trained on: MSCOCO, SBU Captions, Visual Genome, VQAv2, GQA and a few internal datasets. UForm-Gen-Chat is SFT version of [`UForm-Gen`](https://huggingface.co/unum-cloud/uform-gen) for multimodal chat.
20