Post
1303
Introducing FineWeb-C ๐๐, a community-built dataset for improving language models in ALL languages.
Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.
318 annotators, 32K+ annotations, 12 languages - and growing! ๐
data-is-better-together/fineweb-c
Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.
318 annotators, 32K+ annotations, 12 languages - and growing! ๐
data-is-better-together/fineweb-c