This repo contains SlovenianGPT - the best open-source base 7B LLM for Slovenian language developed by Aleksa Gordić (see LinkedIn announcement).

If you're interested in more powerful instruct models for Slovenian languages feel free to reach out via email (surname.name at gmail com)/LinkedIn.

SlovenianGPT eval results compared to Mistral 7B, LLaMA 2 7b, and Gemma (also see this LinkedIn post for more info):

Instruct-SlovenianGPT eval results (LinkedIn post):

Eval was computed using https://github.com/gordicaleksa/slovenian-llm-eval

The model was trained on tens of billions of Slovenian language tokens and is based off of Mistral 7B.

Notes

SlovenianGPT is a base model and therefore does not have any moderation mechanisms.
Since it's a base model it won't follow your instructions as it's just a powerful autocomplete engine.
If you want an access to much more powerful Slovenian LLMs - feel free to reach out to me via email (surname.name at gmail com)/LinkedIn.

Credits

The data for the project was obtained with the help of Nikola Ljubešić.

Also a big thank you to the following individuals:

Aleksander Segedi - for help around bookkeeping!

Citation

@article{SlovenianGPT,
  author    = "Gordić Aleksa",
  title     = "SlovenianGPT - an open-source LLM for Slovenian language",
  year      = "2024"
  howpublished = {\url{https://huggingface.co/gordicaleksa/SlovenianGPT}},
}