DAMO-NLP-SG/VideoRefer-7B-stage2.5

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

If you like our project, please give us a star ⭐ on Github for the latest update.

🌏 Model Zoo

Model Name	Visual Encoder	Language Decoder	# Training Frames
VideoRefer-7B	siglip-so400m-patch14-384	Qwen2-7B-Instruct	16
VideoRefer-7B-stage2	siglip-so400m-patch14-384	Qwen2-7B-Instruct	16
VideoRefer-7B-stage2.5	siglip-so400m-patch14-384	Qwen2-7B-Instruct	16

📑 Citation

If you find VideoRefer Suite useful for your research and applications, please cite using this BibTeX:

@article{yuan2024videorefersuite,
  title = {VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM},
  author = {Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing},
  journal={arXiv},
  year={2024},
  url = {}
}

DAMO-NLP-SG
/

VideoRefer-7B-stage2.5

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

If you like our project, please give us a star ⭐ on Github for the latest update.

🌏 Model Zoo

📑 Citation

Collection including DAMO-NLP-SG/VideoRefer-7B-stage2.5

VideoRefer