VideoRefer
Collection
4 items
β’
Updated
Model Name | Visual Encoder | Language Decoder | # Training Frames |
---|---|---|---|
VideoRefer-7B | siglip-so400m-patch14-384 | Qwen2-7B-Instruct | 16 |
VideoRefer-7B-stage2 | siglip-so400m-patch14-384 | Qwen2-7B-Instruct | 16 |
VideoRefer-7B-stage2.5 | siglip-so400m-patch14-384 | Qwen2-7B-Instruct | 16 |
If you find VideoRefer Suite useful for your research and applications, please cite using this BibTeX:
@article{yuan2024videorefersuite,
title = {VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM},
author = {Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing},
journal={arXiv},
year={2024},
url = {}
}