Zery
/

Image-Text-to-Text
Transformers
PyTorch
English
share4v
text-generation


MV-LLaVA-7B Model Card

Model details

Model type: MV-LLaVA-7B is an open-source chatbot for 3D multi-view images trained by fine-tuning CLIP vision tower and LLaMA/Vicuna on GPT4-Vision-assisted BS-Objaverse data and ShareGPT4V data.

Model date: MV-LLaVA-7B was trained in Apr, 2024.

Paper or resources for more information: [Project] [Paper] [Code]

Usage

You can directly utilize this model as we provide in our [repository].

License

Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

Intended use

Primary intended uses: The primary use of ShareGPT4V-7B is research on large multimodal models and chatbots for 3D content. Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

  • 1.2M ShareGPT4V-PT data
  • 30K GPT4-Vision-generated multi-view image-text pairs
  • LLaVA instruction-tuning data
Downloads last month
24
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Datasets used to train Zery/MV-LLaVA-7B

Collection including Zery/MV-LLaVA-7B