Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
visheratin 
posted an update Mar 10, 2024
Post
Keep stacking cool stuff and getting better results! After I changed the standard vision encoder to SigLIP, NLLB-CLIP got a 10% average performance improvement. And now, I added matryoshka layers (https://arxiv.org/abs/2205.13147) to enable smaller embeddings and got another 6% performance boost! Plus, thanks to MRL, 4.5x smaller embeddings retain 90%+ quality.

The large model is finally SoTA for both image and text multilingual retrieval!

The models are available on the hub:
- visheratin/nllb-siglip-mrl-base
- visheratin/nllb-siglip-mrl-large

hmm, what happens if you throw moondream2 on?https://huggingface.co/vikhyatk/moondream2

·

It uses the same vision encoder, so I expect that nothing changes.