# Jina CLIP The Jina CLIP implementation is hosted in this repository. The model uses: * the EVA 02 architecture for the vision tower * the Jina BERT with Flash Attention model as a text tower To use the Jina CLIP model, the following packages are required: * `torch` * `timm` * `transformers` * `einops` * `xformers` to use x-attention * `flash-attn` to use flash attention * `apex` to use fused layer normalization