Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
visheratin 
posted an update Mar 10
Post
Keep stacking cool stuff and getting better results! After I changed the standard vision encoder to SigLIP, NLLB-CLIP got a 10% average performance improvement. And now, I added matryoshka layers (https://arxiv.org/abs/2205.13147) to enable smaller embeddings and got another 6% performance boost! Plus, thanks to MRL, 4.5x smaller embeddings retain 90%+ quality.

The large model is finally SoTA for both image and text multilingual retrieval!

The models are available on the hub:
- visheratin/nllb-siglip-mrl-base
- visheratin/nllb-siglip-mrl-large

hmm, what happens if you throw moondream2 on?https://huggingface.co/vikhyatk/moondream2

·

It uses the same vision encoder, so I expect that nothing changes.