VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

If you like our project, please give us a star ⭐ on Github for the latest update.

🌏 Model Zoo

πŸ“‘ Citation

If you find VideoRefer Suite useful for your research and applications, please cite using this BibTeX:

@article{yuan2024videorefersuite,
  title = {VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM},
  author = {Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing},
  journal={arXiv},
  year={2024},
  url = {}
}
Downloads last month
16
Safetensors
Model size
8.43B params
Tensor type
FP16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including DAMO-NLP-SG/VideoRefer-7B-stage2.5