Post
1391
Introducing FineWeb-C ππ, a community-built dataset for improving language models in ALL languages.
Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.
318 annotators, 32K+ annotations, 12 languages - and growing! π
data-is-better-together/fineweb-c
Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.
318 annotators, 32K+ annotations, 12 languages - and growing! π
data-is-better-together/fineweb-c