[EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner
This model is RWKV-CLIP-B/32 training on YFCC15M. Please refer to https://github.com/deepglint/RWKV-CLIP for more detailed information.
If you find this model useful, please use the following BibTeX entry for citation.
@misc{gu2024rwkvclip,
title={RWKV-CLIP: A Robust Vision-Language Representation Learner},
author={Tiancheng Gu and Kaicheng Yang and Xiang An and Ziyong Feng and Dongnan Liu and Weidong Cai and Jiankang Deng},
year={2024},
eprint={2406.06973},
archivePrefix={arXiv},
primaryClass={cs.CV}
}