|
--- |
|
license: mit |
|
--- |
|
|
|
SESAME Model Card |
|
|
|
Model details |
|
Model type: SESAME is an open-source multimodal model trained by fine-tuning LLaVA on various instruction-based image grounding (segmentation) data. It is an auto-regressive language model plus a segmentation model. |
|
|
|
Paper or resources for more information: https://see-say-segment.github.io/ |
|
|
|
Where to send questions or comments about the model: https://github.com/see-say-segment/sesame/issues |
|
|
|
Intended use |
|
Primary intended uses: The primary use of SESAME is research on large multimodal models and chatbots. |
|
|
|
Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence. |
|
|
|
Training dataset: (FP-/R-)RefCOCO(+/g) + LLaVA 150K VQA data |