Post
2295
That didn't take long! Nomic AI has finetuned the new ModernBERT-base encoder model into a strong embedding model for search, classification, clustering and more!
Details:
๐ค Based on ModernBERT-base with 149M parameters.
๐ Outperforms both nomic-embed-text-v1 and nomic-embed-text-v1.5 on MTEB!
๐๏ธ Immediate FA2 and unpacking support for super efficient inference.
๐ช Trained with Matryoshka support, i.e. 2 valid output dimensionalities: 768 and 256.
โก๏ธ Maximum sequence length of 8192 tokens!
2๏ธโฃ Trained in 2 stages: unsupervised contrastive data -> high quality labeled datasets.
โ Integrated in Sentence Transformers, Transformers, LangChain, LlamaIndex, Haystack, etc.
๐๏ธ Apache 2.0 licensed: fully commercially permissible
Try it out here: nomic-ai/modernbert-embed-base
Very nice work by Zach Nussbaum and colleagues at Nomic AI.
Details:
๐ค Based on ModernBERT-base with 149M parameters.
๐ Outperforms both nomic-embed-text-v1 and nomic-embed-text-v1.5 on MTEB!
๐๏ธ Immediate FA2 and unpacking support for super efficient inference.
๐ช Trained with Matryoshka support, i.e. 2 valid output dimensionalities: 768 and 256.
โก๏ธ Maximum sequence length of 8192 tokens!
2๏ธโฃ Trained in 2 stages: unsupervised contrastive data -> high quality labeled datasets.
โ Integrated in Sentence Transformers, Transformers, LangChain, LlamaIndex, Haystack, etc.
๐๏ธ Apache 2.0 licensed: fully commercially permissible
Try it out here: nomic-ai/modernbert-embed-base
Very nice work by Zach Nussbaum and colleagues at Nomic AI.