SESAME

  • Model type: SESAME is an open-source multimodal model trained by fine-tuning LLaVA on various instruction-based image grounding (segmentation) data. It is an auto-regressive language model plus a segmentation model.
  • Paper or resources for more information: https://see-say-segment.github.io/
  • Where to send questions or comments about the model: https://github.com/see-say-segment/sesame/issues
  • Intended use
    • Primary intended uses: The primary use of SESAME is research on large multimodal models and chatbots.
    • Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
  • Training dataset: (FP-/R-)RefCOCO(+/g) + LLaVA 150K VQA data
Downloads last month
26
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including tsunghanwu/SESAME