dicta-il
/

dictalm2.0-instruct-GPTQ

+---
+license: apache-2.0
+pipeline_tag: text-generation
+language:
+  - en
+  - he
+tags:
+- instruction-tuned
+base_model: dicta-il/dictalm2.0
+inference:
+  parameters:
+    temperature: 0.7
+---
+[<img src="https://i.ibb.co/5Lbwyr1/dicta-logo.jpg" width="300px"/>](https://dicta.org.il)
+# Model Card for DictaLM-2.0-Instruct
+The DictaLM-2.0-Instruct Large Language Model (LLM) is an instruct fine-tuned version of the [DictaLM-2.0](https://huggingface.co/dicta-il/dictalm2.0) generative model using a variety of conversation datasets.
+For full details of this model please read our [release blog post](https://example.com).
+This model contains the GPTQ 4-bit quantized version of the instruct-tuned model designed for chat [DictaLM-2.0-Instruct](https://huggingface.co/dicta-il/dictalm2.0-instruct).
+You can view and access the full collection of base/instruct unquantized/quantized versions of `DictaLM-2.0` [here](https://huggingface.co/collections/dicta-il/dicta-lm-20-collection-661bbda397df671e4a430c27).
+## Instruction format
+In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
+E.g.
+```
+text = """<s>[INST] What is your favourite condiment? [/INST]
+Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s>[INST] Do you have mayonnaise recipes? [/INST]"
+```
+This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method:
+## Example Code
+Running this code requires under 5GB of GPU VRAM.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+device = "cuda" # the device to load the model onto
+model = AutoModelForCausalLM.from_pretrained("dicta-il/dictalm2.0-instruct-GPTQ", device_map=device)
+tokenizer = AutoTokenizer.from_pretrained("dicta-il/dictalm2.0-instruct-GPTQ")
+messages = [
+    {"role": "user", "content": "מה הרוטב אהוב עליך?"},
+    {"role": "assistant", "content": "טוב, אני די מחבב כמה טיפות מיץ לימון סחוט טרי. זה מוסיף בדיוק את הכמות הנכונה של טעם חמצמץ לכל מה שאני מבשל במטבח!"},
+    {"role": "user", "content": "האם יש לך מתכונים למיונז?"}
+]
+encoded = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
+generated_ids = model.generate(encoded, max_new_tokens=50, do_sample=True)
+decoded = tokenizer.batch_decode(generated_ids)
+print(decoded[0])
+# <s> [INST] מה הרוטב אהוב עליך? [/INST]
+# טוב, אני די מחבב כמה טיפות מיץ לימון סחוט טרי. זה מוסיף בדיוק את הכמות הנכונה של טעם חמצמץ לכל מה שאני מבשל במטבח!</s>  [INST] האם יש לך מתכונים למיונז? [/INST]
+# בטח, הנה מתכון קל מאוד למיונז ביתי:
+#
+# מרכיבים:
+# - 2 ביצים גדולות
+# - 1 כף חרדל דיז'ון
+# - 2 כפות
+# (it stopped early because we set max_new_tokens=50)
+```
+## Model Architecture
+DictaLM-2.0-Instruct follows the [Zephyr-7B-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) recipe for fine-tuning an instruct model, with an extended instruct dataset for Hebrew.
+## Limitations
+The DictaLM 2.0 Instruct model is a demonstration that the base model can be fine-tuned to achieve compelling performance.
+It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
+make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
+## Citation
+If you use this model, please cite:
+```bibtex
+[Will be added soon]
+```