Edit model card

PLLaVA Model Card

Model details

Model type: PLLaVA-13B is an open-source video-language chatbot trained by fine-tuning Image-LLM on video instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: llava-hf/llava-v1.6-vicuna-13b-hf

Model date: PLLaVA-13B was trained in April 2024.

Paper or resources for more information:

License

llava-hf/llava-v1.6-vicuna-13b-hf license.

Where to send questions or comments about the model: https://github.com/magic-research/PLLaVA/issues

Intended use

Primary intended uses: The primary use of PLLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

Video-Instruct-Tuning data of OpenGVLab/VideoChat2-IT

Evaluation dataset

A collection of 6 benchmarks, including 5 VQA benchmarks and 1 recent benchmarks specifically proposed for Video-LMMs.

Downloads last month
268
Safetensors
Model size
13.5B params
Tensor type
BF16
Β·
Inference API
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

Dataset used to train ermu2001/pllava-13b

Spaces using ermu2001/pllava-13b 4