view post Post 1389 Introducing FineWeb-C 🌐🎓, a community-built dataset for improving language models in ALL languages.Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.318 annotators, 32K+ annotations, 12 languages - and growing! 🌍 data-is-better-together/fineweb-c See translation 🔥 3 3 + Reply
Direct Preference Optimization Datasets Collection Datasets suitable for DPO based on having 'chosen', 'rejected', and 'prompt' columns. Created using librarian-bots/dataset-column-search-api • 4412 items • Updated 5 days ago • 6