Tom Aarsen's picture

Tom Aarsen

tomaarsen

AI & ML interests

NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification

Recent Activity

Articles

Organizations

Hugging Face's profile picture Sentence Transformers's profile picture Sentence Transformers - Cross-Encoders's profile picture Hugging Face Internal Testing Organization's profile picture SetFit's profile picture Hugging Face Fellows's profile picture Massive Text Embedding Benchmark's profile picture Open-Source AI Meetup's profile picture Nomic AI's profile picture Hugging Face OSS Metrics's profile picture Blog-explorers's profile picture Sentence Transformers Testing's profile picture mLLM multilingual's profile picture Social Post Explorers's profile picture Answer.AI's profile picture gg-tt's profile picture Distillation Hugs's profile picture Hugging Face Discord Community's profile picture Bert ... but new's profile picture

tomaarsen's activity

posted an update 2 days ago
view post
Post
2184
That didn't take long! Nomic AI has finetuned the new ModernBERT-base encoder model into a strong embedding model for search, classification, clustering and more!

Details:
šŸ¤– Based on ModernBERT-base with 149M parameters.
šŸ“Š Outperforms both nomic-embed-text-v1 and nomic-embed-text-v1.5 on MTEB!
šŸŽļø Immediate FA2 and unpacking support for super efficient inference.
šŸŖ† Trained with Matryoshka support, i.e. 2 valid output dimensionalities: 768 and 256.
āž”ļø Maximum sequence length of 8192 tokens!
2ļøāƒ£ Trained in 2 stages: unsupervised contrastive data -> high quality labeled datasets.
āž• Integrated in Sentence Transformers, Transformers, LangChain, LlamaIndex, Haystack, etc.
šŸ›ļø Apache 2.0 licensed: fully commercially permissible

Try it out here: nomic-ai/modernbert-embed-base

Very nice work by Zach Nussbaum and colleagues at Nomic AI.
upvoted an article 3 days ago