Post
1853
🙋🏻♂️ Hey there folks ,
🦎Salamandra release by @mvillegas and team
@BSC_CNS https://huggingface.co/BSC-LT is absolutely impressive so far !
perhaps the largest single training dataset of high quality text to date of 7.8 trillion tokens in 35 European languages and code.
the best part : the data was correctly licenced so it's actually future-proof!
the completions model is really creative and instruct fine tuned version is very good also.
now you can use such models for multi-lingual enterprise applications with further finetunes , long response generation, structured outputs (coding) also works.
check out 👇🏻
the collection : BSC-LT/salamandra-66fc171485944df79469043a
the repo : https://github.com/langtech-bsc/salamandra
7B-Instruct demo : Tonic/Salamandra-7B
🦎Salamandra release by @mvillegas and team
@BSC_CNS https://huggingface.co/BSC-LT is absolutely impressive so far !
perhaps the largest single training dataset of high quality text to date of 7.8 trillion tokens in 35 European languages and code.
the best part : the data was correctly licenced so it's actually future-proof!
the completions model is really creative and instruct fine tuned version is very good also.
now you can use such models for multi-lingual enterprise applications with further finetunes , long response generation, structured outputs (coding) also works.
check out 👇🏻
the collection : BSC-LT/salamandra-66fc171485944df79469043a
the repo : https://github.com/langtech-bsc/salamandra
7B-Instruct demo : Tonic/Salamandra-7B