--- datasets: - taesiri/TinyStories-Farsi library_name: transformers model_name: LLaMA-3.1-8B-Persian-Instruct pipeline_tag: text-generation tags: - language-model - fine-tuned - instruction-following - PEFT - LoRA - BitsAndBytes - Persian - Farsi - text-generation --- # LLaMA-3.1-8B-Persian-Instruct This model is a fine-tuned version of the `meta-llama/Meta-Llama-3.1-8B-Instruct` model, specifically tailored for generating and understanding Persian text. The fine-tuning was conducted using the [TinyStories-Farsi](https://huggingface.co/datasets/taesiri/TinyStories-Farsi) dataset, which includes a diverse set of short stories in Persian. The primary goal of this fine-tuning was to enhance the model's performance in instruction-following tasks within the Persian language. ## Model Details ### Model Description This model is a fine-tuned version of Llama-3.1-8B-Instruct that meta has released. By training this model on persian short stories, the new model gets to understand the relation between English and Persian in a more meaning full way. - **Developed by:** Meta AI - **Model type:** Language Model - **License:** Apache 2.0 - **Base Model:** `meta-llama/Meta-Llama-3.1-8B-Instruct` ### Model Sources - **Repository:** [Llama-3.1-8B-Instruct on Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) ## Training Details ### Training Data The model was fine-tuned using the [TinyStories-Farsi](https://huggingface.co/datasets/taesiri/TinyStories-Farsi) dataset. This dataset provided a rich and diverse linguistic context, helping the model better understand and generate text in Persian. ### Training Procedure The fine-tuning process was conducted using the following setup: - **Epochs:** 4 - **Batch Size:** 8 - **Gradient Accumulation Steps:** 2 - **Hardware:** NVIDIA A100 GPU ### Fine-Tuning Strategy To make the fine-tuning process efficient and effective, PEFT (Parameter-Efficient Fine-Tuning) techniques were employed. Specifically, the `BitsAndBytesConfig(load_in_4bit=True)` configuration was used, allowing the model to be fine-tuned in 4-bit precision. This approach significantly reduced the computational resources required while maintaining high performance, resulting in a training time of approximately 2 hours. The use of `BitsAndBytesConfig(load_in_4bit=True)` helped reduce the environmental impact by minimizing the computational resources required. ## Uses ### Direct Use This model is well-suited for generating text in Persian, particularly for instruction-following tasks. It can be used in applications like chatbots, customer support systems, educational tools, and more where accurate and context-aware Persian language generation is needed. ### Out-of-Scope Use The model is not intended for tasks requiring deep reasoning, complex multi-turn conversations, or contexts beyond the immediate prompt. It is also not designed for generating text in languages other than Persian. ## How to Get Started with the Model Here is how you can use this model: ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Specify the combined model model_name = "AmirMohseni/Llama-3.1-8B-Instruct-Persian-finetuned-sft" # Load the model and tokenizer model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_name) # Ensure pad_token is set (if not already set) if tokenizer.pad_token is None: tokenizer.add_special_tokens({'pad_token': tokenizer.eos_token}) # Check if CUDA is available, otherwise use CPU device = "cuda" if torch.cuda.is_available() else "cpu" model = model.to(device) # Example usage input_text = "چطوری میتونم به اطلاعات درباره ی سهام شرکت های آمریکایی دست پیدا کنم؟" # Tokenize the input inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True).to(device) # Generate text outputs = model.generate( inputs['input_ids'], attention_mask=inputs['attention_mask'], max_length=512, pad_token_id=tokenizer.pad_token_id ) # Decode and print the output response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ```