fbaldassarri commited on
Commit
2c2cff1
•
1 Parent(s): 3c609eb

Updated README

Browse files
Files changed (1) hide show
  1. README.md +13 -42
README.md CHANGED
@@ -21,6 +21,8 @@ This model has been quantized in INT4, group-size 128, and optimized for inferen
21
  ## 🚨 Reproducibility
22
  This model has been quantized using Intel [auto-round](https://github.com/intel/auto-round), based on [SignRound technique](https://arxiv.org/pdf/2309.05516v4).
23
 
 
 
24
  ```
25
  git clone https://github.com/fbaldassarri/model-conversion.git
26
 
@@ -28,6 +30,8 @@ cd model-conversion
28
 
29
  mkdir models
30
 
 
 
31
  huggingface-cli download --resume-download --local-dir sapienzanlp_modello-italia-9b --local-dir-use-symlinks False sapienzanlp/modello-italia-9b
32
  ```
33
 
@@ -35,15 +39,15 @@ Then,
35
 
36
  ```
37
  python3 main.py \
38
- --model_name ./models/sapienzanlp_modello-italia-9b \
39
- --device 0 \
40
- --group_size 128 \
41
- --bits 4 \
42
- --iters 1000 \
43
- --deployment_device 'cpu' \
44
- --output_dir "./models/sapienzanlp_modello-italia-9b-int4" \
45
- --train_bs 1 \
46
- --gradient_accumulate_steps 8
47
  ```
48
 
49
  ## 🚨 Biases and Risks
@@ -74,36 +78,3 @@ For more information about this issue, please refer to our survey paper:
74
  **Modello Italia 9B INT4 group-size 128 cpu-optimized** has not been evaluated on standard benchmarks yet.
75
  If you would like to contribute with your evaluation, please feel free to submit a pull request.
76
 
77
- ## How to use Modello Italia with Hugging Face transformers
78
-
79
- ```python
80
- import torch
81
- import transformers as tr
82
-
83
- device = "cuda" if torch.cuda.is_available() else "cpu"
84
-
85
- tokenizer = tr.AutoTokenizer.from_pretrained("sapienzanlp/modello-italia-9b-bf16")
86
- model = tr.AutoModelForCausalLM.from_pretrained(
87
- "sapienzanlp/modello-italia-9b-bf16",
88
- device_map=device,
89
- torch_dtype=torch.bfloat16
90
- )
91
-
92
- MY_SYSTEM_PROMPT_SHORT = (
93
- "Tu sei Modello Italia, un modello di linguaggio naturale addestrato da iGenius."
94
- )
95
- prompt = "Ciao, chi sei?"
96
- messages = [
97
- {"role": "system", "content": MY_SYSTEM_PROMPT_SHORT},
98
- {"role": "user", "content": prompt},
99
- ]
100
- tokenized_chat = tokenizer.apply_chat_template(
101
- messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
102
- ).to(device)
103
-
104
- out = model.generate(
105
- tokenized_chat,
106
- max_new_tokens=200,
107
- do_sample=False
108
- )
109
- ```
 
21
  ## 🚨 Reproducibility
22
  This model has been quantized using Intel [auto-round](https://github.com/intel/auto-round), based on [SignRound technique](https://arxiv.org/pdf/2309.05516v4).
23
 
24
+
25
+
26
  ```
27
  git clone https://github.com/fbaldassarri/model-conversion.git
28
 
 
30
 
31
  mkdir models
32
 
33
+ cd models
34
+
35
  huggingface-cli download --resume-download --local-dir sapienzanlp_modello-italia-9b --local-dir-use-symlinks False sapienzanlp/modello-italia-9b
36
  ```
37
 
 
39
 
40
  ```
41
  python3 main.py \
42
+ --model_name ./models/sapienzanlp_modello-italia-9b \
43
+ --device 0 \
44
+ --group_size 128 \
45
+ --bits 4 \
46
+ --iters 1000 \
47
+ --deployment_device 'cpu' \
48
+ --output_dir "./models/sapienzanlp_modello-italia-9b-int4" \
49
+ --train_bs 1 \
50
+ --gradient_accumulate_steps 8
51
  ```
52
 
53
  ## 🚨 Biases and Risks
 
78
  **Modello Italia 9B INT4 group-size 128 cpu-optimized** has not been evaluated on standard benchmarks yet.
79
  If you would like to contribute with your evaluation, please feel free to submit a pull request.
80