File size: 2,177 Bytes
0e3c9c5 9399cb4 3802cb8 c2a866d 0e3c9c5 3802cb8 bba8fda 3802cb8 bba8fda 3802cb8 87d7018 3802cb8 bba8fda 3802cb8 bba8fda 3802cb8 bba8fda 3802cb8 bba8fda 3802cb8 bba8fda 3802cb8 bba8fda 3802cb8 bba8fda 3802cb8 bba8fda 3802cb8 86c9ef1 3802cb8 bba8fda 3802cb8 87d7018 3802cb8 7389966 3802cb8 7389966 c2a866d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
---
base_model: unsloth/Mistral-Nemo-Base-2407-bnb-4bit
library_name: transformers
language:
- ar
pipeline_tag: text-generation
datasets:
- MahmoudIbrahim/Arabic_NVIDIA
---
- **Developed by:** Mahmoud Ibrahim
-
**How to use :**
``` bush
! pip install transformers bitsandbytes
```
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from IPython.display import Markdown
import textwrap
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("MahmoudIbrahim/Mistral_12b_Arabic")
model = AutoModelForCausalLM.from_pretrained("MahmoudIbrahim/Mistral_12b_Arabic",load_in_4bit =True)
alpaca_prompt = """فيما يلي تعليمات تصف مهمة، إلى جانب مدخل يوفر سياقاً إضافياً. اكتب استجابة تُكمل الطلب بشكل مناسب.
### التعليمات:
{}
### الاستجابة:
{}"""
# Format the prompt with instruction and an empty output placeholder
formatted_prompt = alpaca_prompt.format(
"كيف يمكن للحكومة المصرية والمجتمع ككل أن يعززوا من قدرة البلاد على تحقيق التنمية المستدامة؟ " , # instruction
"" # Leave output blank for generation
)
# Tokenize the formatted string directly
input_ids = tokenizer.encode(formatted_prompt, return_tensors="pt") # Use 'cuda' if you want to run on GPU
def to_markdown(text):
text = text.replace('•','*')
return Markdown(textwrap.indent(text, '>', predicate=lambda _: True))
# Generate text
output = model.generate(
input_ids,
max_length=128, # Adjust max length as needed
num_return_sequences=1, # Number of generated responses
no_repeat_ngram_size=2, # Prevent repetition
top_k=50, # Filter to top-k tokens
top_p=0.9, # Use nucleus sampling
temperature=0.7 , # Control creativity level
)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
to_markdown(generated_text)
```
**The model response :**
 |