Post
1542
Tulu 3 SFT Mixture by AllenAI is a massive, good, multilingual dataset for fine-tuning Language Models.
Unfortunately, it was missing the "language" column.
I added it using the good old fastText.
Check out the dataset here ๐ anakin87/tulu-3-sft-mixture-with-language
Unfortunately, it was missing the "language" column.
I added it using the good old fastText.
Check out the dataset here ๐ anakin87/tulu-3-sft-mixture-with-language