if you can record the problem and share it there , or on the forums in your own post , please dont be shy because i'm not sure but i do think it helps π€π€π€
boomers still pick zenodo.org instead of huggingface ??? absolutely clownish nonsense , my random datasets have 30x more downloads and views than front page zenodos ... gonna write a comparison blog , but yeah... cringe.
really enjoying sharing cool genomics and protein datasets on the hub these days , check out our cool new org : https://huggingface.co/seq-to-pheno
scroll down for the datasets, still figuring out how to optimize for discoverability , i do think on that part it will be better than zenodo[dot}org , it would be nice to write a tutorial about that and compare : we already have more downloads than most zenodo datasets from famous researchers !
did you know that https://huggingface.co/lmms-lab released a new version of ππLlava on thursday ? Now it has π₯video understanding ! check it out ππ»
perhaps the largest single training dataset of high quality text to date of 7.8 trillion tokens in 35 European languages and code.
the best part : the data was correctly licenced so it's actually future-proof!
the completions model is really creative and instruct fine tuned version is very good also.
now you can use such models for multi-lingual enterprise applications with further finetunes , long response generation, structured outputs (coding) also works.
@mlabonne hey there ππ»ββοΈ I kinda got obsessed with your great model , and i found the endpoint for it in lambda labs, but basically i got rate limited / banned for trying to make my DPO dataset project, i was wondering if you all had an open ai compatible solution for me to make a great "thinking" sft + dpo dataset with all the splits ππ»ππ» kinda desparate , it's true , but was looking forward to a nice write ups πππ
βοΈInkubaLM has been trained from scratch using 1.9 billion tokens of data for five African languages, along with English and French data, totaling 2.4 billion tokens of data. It is capable of understanding and generating content in five African languages: Swahili, Yoruba, Hausa, isiZulu, and isiXhosa, as well as English and French.