Kaichengalex/RWKV-CLIP-B32-YFCC15M

[EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner

This model is RWKV-CLIP-B/32 training on YFCC15M. Please refer to https://github.com/deepglint/RWKV-CLIP for more detailed information.

If you find this model useful, please use the following BibTeX entry for citation.

@misc{gu2024rwkvclip,
      title={RWKV-CLIP: A Robust Vision-Language Representation Learner}, 
      author={Tiancheng Gu and Kaicheng Yang and Xiang An and Ziyong Feng and Dongnan Liu and Weidong Cai and Jiankang Deng},
      year={2024},
      eprint={2406.06973},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}