--- license: gemma language: - or - en --- # odia-gemma-2b-base (Pre-trained) Odia-Gemma-2B-Base is a pre-trained Odia large language model with 2 billion parameters, and it is based on Google/Gemma 2B. The model is pre-trained on the Culturex-Odia dataset, a filtered version of the original CulturaX dataset for Odia text. The training dataset contains 49 million tokens. The CulturaX-Odia dataset is sourced from mc4 and four distinct OSCAR corpora. For more details about the model, data, training procedure, and evaluations, go through the blog [post](). ## Model Description * Model type: A 2B pre-trained decoder-only model * Primary Language(s): Odia and English * License: Gemma Terms of Use **NOTE** This is not an instruction-tuned model, so it may not be able to follow human instructions without using one/few-shot learning or instruction fine-tuning. The model has no moderation mechanisms and may generate harmful or inappropriate responses. It is recommended to first fine-tune it on the task(s) you are interested in. ### Citation Information If you find this model useful, please consider giving 👏 and citing: ``` @misc{odia-gemma-2b-base, author = {Sambit Sekhar and Shantipriya Parida and Debasish Dhal and Guneet Singh Kohli}, title = {OdiaGenAI Introduces Gemma 2B Pre-Trained LLM Catered to Odia Speakers}, year = {2024}, publisher = {Hugging Face}, journal = {Hugging Face repository}, howpublished = {\url{https://huggingface.co/OdiaGenAI}}, } ``` ### Contributions - Sambit Sekhar - Shantipriya Parida - Debasish Dhal - Guneet Singh Kohli