fbaldassarri
/

modello-italia-9b-autoround-w4g128-cpu

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

fbaldassarri commited on 18 days ago

Commit

2c2cff1

•

1 Parent(s): 3c609eb

Updated README

Files changed (1) hide show

README.md +13 -42

README.md CHANGED Viewed

@@ -21,6 +21,8 @@ This model has been quantized in INT4, group-size 128, and optimized for inferen
 ## 🚨 Reproducibility
 This model has been quantized using Intel [auto-round](https://github.com/intel/auto-round), based on [SignRound technique](https://arxiv.org/pdf/2309.05516v4).
 ```
 git clone https://github.com/fbaldassarri/model-conversion.git
@@ -28,6 +30,8 @@ cd model-conversion
 mkdir models
 huggingface-cli download --resume-download --local-dir sapienzanlp_modello-italia-9b --local-dir-use-symlinks False  sapienzanlp/modello-italia-9b
 ```
@@ -35,15 +39,15 @@ Then,
 ```
 python3 main.py \
---model_name  ./models/sapienzanlp_modello-italia-9b \
---device 0 \
---group_size 128 \
---bits 4 \
---iters 1000 \
---deployment_device 'cpu' \
---output_dir "./models/sapienzanlp_modello-italia-9b-int4" \
---train_bs 1 \
---gradient_accumulate_steps 8
 ```
 ## 🚨 Biases and Risks
@@ -74,36 +78,3 @@ For more information about this issue, please refer to our survey paper:
 **Modello Italia 9B INT4 group-size 128 cpu-optimized** has not been evaluated on standard benchmarks yet.
 If you would like to contribute with your evaluation, please feel free to submit a pull request.
-## How to use Modello Italia with Hugging Face transformers
-```python
-import torch
-import transformers as tr
-device = "cuda" if torch.cuda.is_available() else "cpu"
-tokenizer = tr.AutoTokenizer.from_pretrained("sapienzanlp/modello-italia-9b-bf16")
-model = tr.AutoModelForCausalLM.from_pretrained(
-  "sapienzanlp/modello-italia-9b-bf16",
-  device_map=device,
-  torch_dtype=torch.bfloat16
-)
-MY_SYSTEM_PROMPT_SHORT = (
-  "Tu sei Modello Italia, un modello di linguaggio naturale addestrato da iGenius."
-)
-prompt = "Ciao, chi sei?"
-messages = [
-  {"role": "system", "content": MY_SYSTEM_PROMPT_SHORT},
-  {"role": "user", "content": prompt},
-]
-tokenized_chat = tokenizer.apply_chat_template(
-  messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
-).to(device)
-out = model.generate(
-  tokenized_chat,
-  max_new_tokens=200,
-  do_sample=False
-)
-```

 ## 🚨 Reproducibility
 This model has been quantized using Intel [auto-round](https://github.com/intel/auto-round), based on [SignRound technique](https://arxiv.org/pdf/2309.05516v4).
 ```
 git clone https://github.com/fbaldassarri/model-conversion.git
 mkdir models
+cd models
 huggingface-cli download --resume-download --local-dir sapienzanlp_modello-italia-9b --local-dir-use-symlinks False  sapienzanlp/modello-italia-9b
 ```
 ```
 python3 main.py \
+  --model_name  ./models/sapienzanlp_modello-italia-9b \
+  --device 0 \
+  --group_size 128 \
+  --bits 4 \
+  --iters 1000 \
+  --deployment_device 'cpu' \
+  --output_dir "./models/sapienzanlp_modello-italia-9b-int4" \
+  --train_bs 1 \
+  --gradient_accumulate_steps 8
 ```
 ## 🚨 Biases and Risks
 **Modello Italia 9B INT4 group-size 128 cpu-optimized** has not been evaluated on standard benchmarks yet.
 If you would like to contribute with your evaluation, please feel free to submit a pull request.