tinyllava
/

TinyLLaVA-Phi-2-SigLIP-3.1B

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

tinyllava commited on May 17

Commit

3fe6e8e

•

1 Parent(s): 8564ba5

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -3,8 +3,9 @@ license: apache-2.0
 pipeline_tag: image-text-to-text
 ---
-### <span style="font-size:2em;">TinyLLaVA</span>
-[![hf_space](https://img.shields.io/badge/🤗-%20Open%20In%20HF-blue.svg)](https://huggingface.co/tinyllava) [![arXiv](https://img.shields.io/badge/Arxiv-2402.14289-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2402.14289) [![Github](https://img.shields.io/badge/Github-Github-orange.svg)](https://github.com/TinyLLaVA/TinyLLaVA_Factory) [![Doc](https://img.shields.io/badge/Doc-Document-logo=read%20the%20docs&logoColor=white&label=Doc)](https://tinyllava-factory.readthedocs.io/en/latest/) [![Demo](https://img.shields.io/badge/Demo-Demo-red.svg)](http://8843843nmph5.vicp.fun/#/)
 TinyLLaVA has released a family of small-scale Large Multimodel Models(LMMs), ranging from 1.4B to 3.1B. Our best model, TinyLLaVA-Phi-2-SigLIP-3.1B, achieves better overall performance against existing 7B models such as LLaVA-1.5 and Qwen-VL.
 Here, we introduce TinyLLaVA-Phi-2-SigLIP-3.1B, which is trained by the TinyLLaVA Factory codebase. For LLM and vision tower, we choose [Phi-2](microsoft/phi-2) and [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384), respectively. The dataset used for training this model is the [ShareGPT4V](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md) dataset.

 pipeline_tag: image-text-to-text
 ---
+**<center><span style="font-size:2em;">TinyLLaVA</span></center>**
+[![arXiv](https://img.shields.io/badge/Arxiv-2402.14289-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2402.14289)[![Github](https://img.shields.io/badge/Github-Github-blue.svg)](https://github.com/TinyLLaVA/TinyLLaVA_Factory)[![Demo](https://img.shields.io/badge/Demo-Demo-red.svg)](http://8843843nmph5.vicp.fun/#/)
 TinyLLaVA has released a family of small-scale Large Multimodel Models(LMMs), ranging from 1.4B to 3.1B. Our best model, TinyLLaVA-Phi-2-SigLIP-3.1B, achieves better overall performance against existing 7B models such as LLaVA-1.5 and Qwen-VL.
 Here, we introduce TinyLLaVA-Phi-2-SigLIP-3.1B, which is trained by the TinyLLaVA Factory codebase. For LLM and vision tower, we choose [Phi-2](microsoft/phi-2) and [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384), respectively. The dataset used for training this model is the [ShareGPT4V](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md) dataset.