Commit
•
2c2cff1
1
Parent(s):
3c609eb
Updated README
Browse files
README.md
CHANGED
@@ -21,6 +21,8 @@ This model has been quantized in INT4, group-size 128, and optimized for inferen
|
|
21 |
## 🚨 Reproducibility
|
22 |
This model has been quantized using Intel [auto-round](https://github.com/intel/auto-round), based on [SignRound technique](https://arxiv.org/pdf/2309.05516v4).
|
23 |
|
|
|
|
|
24 |
```
|
25 |
git clone https://github.com/fbaldassarri/model-conversion.git
|
26 |
|
@@ -28,6 +30,8 @@ cd model-conversion
|
|
28 |
|
29 |
mkdir models
|
30 |
|
|
|
|
|
31 |
huggingface-cli download --resume-download --local-dir sapienzanlp_modello-italia-9b --local-dir-use-symlinks False sapienzanlp/modello-italia-9b
|
32 |
```
|
33 |
|
@@ -35,15 +39,15 @@ Then,
|
|
35 |
|
36 |
```
|
37 |
python3 main.py \
|
38 |
-
--model_name ./models/sapienzanlp_modello-italia-9b \
|
39 |
-
--device 0 \
|
40 |
-
--group_size 128 \
|
41 |
-
--bits 4 \
|
42 |
-
--iters 1000 \
|
43 |
-
--deployment_device 'cpu' \
|
44 |
-
--output_dir "./models/sapienzanlp_modello-italia-9b-int4" \
|
45 |
-
--train_bs 1 \
|
46 |
-
--gradient_accumulate_steps 8
|
47 |
```
|
48 |
|
49 |
## 🚨 Biases and Risks
|
@@ -74,36 +78,3 @@ For more information about this issue, please refer to our survey paper:
|
|
74 |
**Modello Italia 9B INT4 group-size 128 cpu-optimized** has not been evaluated on standard benchmarks yet.
|
75 |
If you would like to contribute with your evaluation, please feel free to submit a pull request.
|
76 |
|
77 |
-
## How to use Modello Italia with Hugging Face transformers
|
78 |
-
|
79 |
-
```python
|
80 |
-
import torch
|
81 |
-
import transformers as tr
|
82 |
-
|
83 |
-
device = "cuda" if torch.cuda.is_available() else "cpu"
|
84 |
-
|
85 |
-
tokenizer = tr.AutoTokenizer.from_pretrained("sapienzanlp/modello-italia-9b-bf16")
|
86 |
-
model = tr.AutoModelForCausalLM.from_pretrained(
|
87 |
-
"sapienzanlp/modello-italia-9b-bf16",
|
88 |
-
device_map=device,
|
89 |
-
torch_dtype=torch.bfloat16
|
90 |
-
)
|
91 |
-
|
92 |
-
MY_SYSTEM_PROMPT_SHORT = (
|
93 |
-
"Tu sei Modello Italia, un modello di linguaggio naturale addestrato da iGenius."
|
94 |
-
)
|
95 |
-
prompt = "Ciao, chi sei?"
|
96 |
-
messages = [
|
97 |
-
{"role": "system", "content": MY_SYSTEM_PROMPT_SHORT},
|
98 |
-
{"role": "user", "content": prompt},
|
99 |
-
]
|
100 |
-
tokenized_chat = tokenizer.apply_chat_template(
|
101 |
-
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
|
102 |
-
).to(device)
|
103 |
-
|
104 |
-
out = model.generate(
|
105 |
-
tokenized_chat,
|
106 |
-
max_new_tokens=200,
|
107 |
-
do_sample=False
|
108 |
-
)
|
109 |
-
```
|
|
|
21 |
## 🚨 Reproducibility
|
22 |
This model has been quantized using Intel [auto-round](https://github.com/intel/auto-round), based on [SignRound technique](https://arxiv.org/pdf/2309.05516v4).
|
23 |
|
24 |
+
|
25 |
+
|
26 |
```
|
27 |
git clone https://github.com/fbaldassarri/model-conversion.git
|
28 |
|
|
|
30 |
|
31 |
mkdir models
|
32 |
|
33 |
+
cd models
|
34 |
+
|
35 |
huggingface-cli download --resume-download --local-dir sapienzanlp_modello-italia-9b --local-dir-use-symlinks False sapienzanlp/modello-italia-9b
|
36 |
```
|
37 |
|
|
|
39 |
|
40 |
```
|
41 |
python3 main.py \
|
42 |
+
--model_name ./models/sapienzanlp_modello-italia-9b \
|
43 |
+
--device 0 \
|
44 |
+
--group_size 128 \
|
45 |
+
--bits 4 \
|
46 |
+
--iters 1000 \
|
47 |
+
--deployment_device 'cpu' \
|
48 |
+
--output_dir "./models/sapienzanlp_modello-italia-9b-int4" \
|
49 |
+
--train_bs 1 \
|
50 |
+
--gradient_accumulate_steps 8
|
51 |
```
|
52 |
|
53 |
## 🚨 Biases and Risks
|
|
|
78 |
**Modello Italia 9B INT4 group-size 128 cpu-optimized** has not been evaluated on standard benchmarks yet.
|
79 |
If you would like to contribute with your evaluation, please feel free to submit a pull request.
|
80 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|