Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 32 items • Updated 25 days ago • 61
Wikimedia Datasets Collection Wikimedia datasets, across languages and modalities, from different Wikimedia projects, on the hub. Not all tested. • 19 items • Updated May 16 • 10
occiglot-eu5-7b-v0.1 Collection First release of 7B LLMs models for the 5 biggest European languages. All models initialised from mistral-7b-v0.1. • 10 items • Updated Mar 7 • 21
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 79
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning Paper • 2301.09626 • Published Jan 23, 2023 • 2