# Jina CLIP

The Jina CLIP implementation is hosted in this repository. The model uses:
* the EVA 02 architecture for the vision tower
* the Jina BERT with Flash Attention model as a text tower

To use the Jina CLIP model, the following packages are required:
* `torch`
* `timm`
* `transformers`
* `einops`
* `xformers` to use x-attention
* `flash-attn` to use flash attention
* `apex` to use fused layer normalization