Video-Text-to-Text
Transformers
Safetensors
English
llava_llama
Inference Endpoints
nielsr's picture
nielsr HF staff
Add tag, link to paper
27cbc86 verified
|
raw
history blame
186 Bytes
metadata
license: mit
pipeline_tag: video-text-to-text
datasets:
  - liuhaotian/LLaVA-Instruct-150K
  - OpenGVLab/VideoChat2-IT
language:
  - en

Paper: https://huggingface.co/papers/2409.01071