File size: 4,499 Bytes

---
license: cc-by-4.0
language:
- he
inference: false
---
# **DictaLM**: A Large Generative Language Model for Modern Hebrew 

A large generative pretrained transformer (GPT) language model for Hebrew, released [link to be added].

This model was fine-tuned for instructions:
- General questions: 
    ```
    מה זה בית ספר?
    ```

    ```
    קיבלתי חתך קל באצבע. מהי הדרך הנכונה לטפל בזה?
    ```
- Simple tasks:
    ```
    תציע כמה רעיונות לפעילות עם ילדים בני 5:
    ```
- Information retrieval from a paragraph context:
     
    ```
        המסיק הידני הוא הדרך המסורתית והעתיקה לקטיף זיתים. שיטה זו דורשת כוח אדם רב באופן יחסי ועדיין מקובלת בישראל ובמקומות רבים בעולם. שיטות מסיק ידני מאפשרות חיסכון עלויות במקומות בהם כוח האדם זול ועלות השיטות הממוכנות גבוהה. לזיתים המיועדים למאכל (לכבישה, בניגוד לזיתים לשמן) מתאים יותר מסיק ידני כיוון שהפרי פחות נפגע במהלך המסיק בשיטה זו (פגיעות בקליפת הפרי בזיתים לשמן פחות משמעותיות). כמו כן מועדף מסיק ידני באזורים בהם הטופוגרפיה המקומית או צפיפות העצים לא מאפשרים גישה נוחה לכלים מכנים. השיטה הידנית מאפשרת גם למסוק עצים שונים במועדים שונים, בהתאם לקצב הבשלת הפרי הטבעי בכל עץ.
        
        על בסיס הפסקה הזאת, מה הוא היתרון של מסיק ידני מבחינת קצב הבשלת הפרי?
    ```

## Sample usage:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained('dicta-il/dictalm-7b-instruct')
# If you don't have cuda installed, remove the `.cuda()` call at the end
model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b-instruct', trust_remote_code=True).cuda()

model.eval()

with torch.inference_mode():
    prompt = 'תציע כמה רעיונות לפעילות עם ילדים בני 5:\n'
    kwargs = dict(
        inputs=tokenizer(prompt, return_tensors='pt').input_ids.to(model.device),
        do_sample=True,
        top_k=50,
        top_p=0.95,
        temperature=0.75,
        max_length=100,
        min_new_tokens=5
    )
    
    print(tokenizer.batch_decode(model.generate(**kwargs), skip_special_tokens=True))
```

### Alternative ways to initialize the model:

If you have multiple smaller GPUs, and the package `accelerate` is installed, you can initialize the model split across the devices:
```python
model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b-instruct', trust_remote_code=True, device_map='auto')
```

If you are running on linux and have the `bitsandbytes` package installed, you can initialize the model in 4/8 bit inference mode:
```python
model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b-instruct', trust_remote_code=True, load_in_8bit=True)
```

If you have [FlashAttention](https://github.com/Dao-AILab/flash-attention) installed in your environment, you can instruct the model to use the flash attention implementation (either V1 or V2, whichever is installed):
```python
model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b-instruct', trust_remote_code=True, use_flash_attention=True)
```



There are many different parameters you can input into `kwargs` for different results (greedy, beamsearch, different samplign configurations, longer/shorter respones, etc.).

You can view the full list of parameters you can pass to the `generate` function [here](https://huggingface.co/docs/transformers/v4.33.0/en/main_classes/text_generation#transformers.GenerationMixin.generate).


## Citation

If you use DictaLM in your research, please cite ```ADD CITATION HERE```

**BibTeX:**

```ADD BIBTEXT HERE```

## License

Shield: [![CC BY 4.0][cc-by-shield]][cc-by]

This work is licensed under a
[Creative Commons Attribution 4.0 International License][cc-by].

[![CC BY 4.0][cc-by-image]][cc-by]

[cc-by]: http://creativecommons.org/licenses/by/4.0/
[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg