view article Article Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK By davidberenstein1957 • 4 days ago • 19
Drowning in Documents: Consequences of Scaling Reranker Inference Paper • 2411.11767 • Published 7 days ago • 16
view article Article Halo: Open Source Health Tracking with Wearables By cyrilzakka • 6 days ago • 76
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais • 12 days ago • 94
Training with Prompts Collection See the Training with Prompts documentation for more details: https://sbert.net/examples/training/prompts/README.html • 5 items • Updated 18 days ago • 3
view article Article Releasing Common Corpus: the largest public domain dataset for training LLMs By Pclanglais • Mar 20 • 17
Model2Vec base models Collection These are the Minishlab Model2Vec base models. Load them and use them with model2vec (https://github.com/MinishLab/model2vec) or sentence-transformers • 7 items • Updated 27 days ago • 8
POTION Collection These are the flagship POTION models. Load them and use them with model2vec (https://github.com/MinishLab/model2vec) or sentence-transformers • 3 items • Updated 26 days ago • 6
view article Article Releasing Outlines-core 0.1.0: structured generation in Rust and Python Oct 22 • 41
Granite 3.0 Language Models Collection A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 21 days ago • 91
MedEmbed: Embedding Models for Medical Domain Collection GitHub -> https://github.com/abhinand5/MedEmbed • 4 items • Updated Oct 21 • 7
view article Article MedEmbed: Fine-Tuned Embedding Models for Medical / Clinical IR By abhinand • Oct 20 • 31