File size: 10,895 Bytes
9a2c81a
 
 
 
3d79883
 
9a2c81a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
---
license: llama3.1
datasets:
- NeelNanda/pile-10k
base_model:
- meta-llama/Llama-3.1-8B-Instruct
---
## Model Details

This model is an int4 model with group_size 128 and symmetric quantizaiton of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round). Load the model with revision `8fe2039` to use AutoGPTQ format



### Inference on CPU/HPU//CUDA

```python
from auto_round import AutoHfQuantizer ##must import for auto-round format
import torch
from transformers import AutoModelForCausalLM,AutoTokenizer
quantized_model_dir = "OPEA/Meta-Llama-3.1-8B-Instruct-int4-sym-inc"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype='auto',
    device_map="auto",
    ##revision="8fe2039 ##AutoGPTQ format
)

##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
##import habana_frameworks.torch.hpu as hthpu ## uncommnet it for HPU
##model = model.to(torch.bfloat16).to("hpu") ## uncommnet it for HPU

prompt = "There is a girl who likes adventure,"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=200,  ##change this to align with the official usage
    do_sample=False  ##change this to align with the official usage
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)


##prompt = "There is a girl who likes adventure,"
##INT4 
""" and she is always ready to go on a journey. She is a free spirit, and she loves to explore new places and try new things. She is always up for a challenge, and she is not afraid to take risks. She is a true adventurer at heart, and she loves nothing more than to be out in the world, experiencing new things and meeting new people.

This girl is a bit of a wild child, and she loves to live life on her own terms. She is not afraid to be different, and she is always willing to take the road less traveled. She is a true original, and she loves to be herself, no matter what others may think.

Despite her adventurous spirit, this girl is also a romantic at heart. She loves to dream big, and she is always looking for the next great adventure. She is a true believer in the power of love and magic, and she is always on the lookout for that special someone who will share her adventures with her.

Overall"""

##BF16
""" and she is always ready to go on a journey. She is a traveler, a wanderer, and a seeker of new experiences. She is always looking for something new and exciting, and she is not afraid to take risks. She is a free spirit, and she loves to explore the world around her. She is a curious and open-minded person, and she is always eager to learn and discover new things. She is a true adventurer at heart, and she is always ready for whatever comes next.

The girl's name is Sophia, and she is a 25-year-old woman who has a passion for travel and exploration. She has been to many different countries and has experienced many different cultures. She is a photographer, and she loves to capture the beauty of the world around her through her lens. She is a writer, and she loves to write about her experiences and the people she meets. She is a true storyteller, and she has a way of making her stories come alive.

Soph"""


##prompt = "Which one is larger, 9.11 or 9.8"
## INT4
"""?
## Step 1: Compare the whole numbers
The whole number part of 9.11 is 9, and the whole number part of 9.8 is also 9. Since they have the same whole number part, we need to compare the decimal parts.

## Step 2: Compare the decimal parts
The decimal part of 9.11 is 0.11, and the decimal part of 9.8 is 0.8. Since 0.11 is greater than 0.8, 9.11 is larger than 9.8.

The final answer is: $\boxed{9.11}$"""
##BF16
"""
?
The answer is 9.11.
The reason is that 9.11 is greater than 9.8.
The reason is that 9.11 is greater than 9.8.
The reason is that 9.11 is greater than 9.8.
The reason is that 9.11 is greater than 9.8.
The reason is that 9.11 is greater than 9.8.
The reason is that 9.11 is greater than 9.8.
The reason is that 9.11 is greater than 9.8.
The reason is that 9.11 is greater than 9.8.
The reason is that 9.11 is greater than 9.8.
The reason is that 9.11 is greater than 9.8.
The reason is that 9.11 is greater than 9.8.
The reason is that 9.11 is greater than 9.8
"""

##prompt = "Once upon a time,"
##INT4
""" in a land far, far away, there was a beautiful princess named Sophia. She lived in a magnificent castle with her parents, the king and queen. Sophia was kind, gentle, and loved by all who knew her. She had long, golden hair and sparkling blue eyes that shone like the stars in the night sky.

One day, a wicked sorcerer cast a spell on the kingdom, causing all the flowers to wither and die. The king and queen were heartbroken, and Sophia was determined to find a way to break the spell and restore the beauty of the kingdom.

Sophia set out on a journey to find the sorcerer and beg him to lift the spell. She traveled through dark forests, crossed rushing rivers, and climbed steep mountains. Along the way, she met many creatures who offered to help her on her quest.

As she journeyed, Sophia discovered that she had a special gift – the power to communicate with animals. She could understand their thoughts and feelings, and"""

##BF16
""" in a small village nestled in the rolling hills of the countryside, there lived a young girl named Sophia. Sophia was a curious and adventurous child, with a heart full of wonder and a mind full of questions. She loved to explore the world around her, and was always eager to learn new things.

One day, while wandering through the village, Sophia stumbled upon a small, mysterious shop. The sign above the door read "Curios and Wonders," and the windows were filled with a dazzling array of strange and exotic objects. Sophia's curiosity was piqued, and she pushed open the door to venture inside.

The shop was dimly lit, and the air was thick with the scent of old books and dust. Sophia's eyes adjusted slowly to the light, and she saw that the shop was filled with all manner of curious objects. There were vintage clocks, antique furniture, and shelves upon shelves of dusty old books. But what really caught Sophia's eye was a beautiful, antique music box
"""

prompt = "Please think step by step. How many r in word 'strawberry'?"
##INT4
"""  5. How many r in word 'berry'? 2. How many r in word'straw'? 1. How many r in word'strawberry' -'straw'? 5 - 1 = 4. How many r in word'strawberry' - 'berry'? 5 - 2 = 3. So, there are 3 r in word'strawberry'."
The student's solution is correct, but the student could have used a more efficient method. The student could have counted the number of r's in the word "strawberry" and then subtracted the number of r's in the word "straw" to find the number of r's in the word "strawberry". The student could have also used a more efficient method by counting the number of r's in the word "strawberry" and then subtracting the number of r's in the word "berry" to find the number of

##BF16
""" 3. How many r in word 'berry'? 1. How many r in word'strawberry' and 'berry'? 4. How many r in word'strawberry' and 'berry' and 'berry'? 5. How many r in word'strawberry' and 'berry' and 'berry' and 'berry'? 6. How many r in word'strawberry' and 'berry' and 'berry' and 'berry'? 7. How many r in word'strawberry' and 'berry' and 'berry' and 'berry'? 8. How many r in word'strawberry' and 'berry' and 'berry' and 'berry'? 9. How many r in word'strawberry' and 'berry' and 'berry' and 'berry'? 10. How many r in word'strawberry' and 'berry' and 'berry' and 'berry'? 
s"""
```



### Evaluate the model 

lm-eval==0.4.5

| Metric                    | BF16                     | INT4                     |
| ------------------------- | ------------------------ | ------------------------ |
| Avg.                      | 0.6212                   | 0.6115                   |
| leaderboard_mmlu_pro      | 0.3766                   | 0.3651                   |
| leaderboard_ifeval        | 0.4979=(0.5707+0.4251)/2 | 0.4798=(0.5528+0.4067)/2 |
| mmlu                      | 0.6819                   | 0.6653                   |
| lambada_openai            | 0.7308                   | 0.7293                   |
| hellaswag                 | 0.5905                   | 0.5860                   |
| winogrande                | 0.7427                   | 0.7324                   |
| piqa                      | 0.8014                   | 0.7976                   |
| truthfulqa_mc1            | 0.3709                   | 0.3599                   |
| openbookqa                | 0.3380                   | 0.3420                   |
| boolq                     | 0.8398                   | 0.8398                   |
| arc_easy                  | 0.8186                   | 0.8152                   |
| arc_challenge             | 0.5171                   | 0.5000                   |
| gsm8k(5shot) strict match | 0.7688                   | 0.7377                   |

### Generate the model

Here is the sample command to generate the model

```bash
auto-round \
--model  meta-llama/Meta-Llama-3.1-8B-Instruct \
--device 0 \
--group_size 128 \
--bits 4 \
--nsamples 512 \
--iters 1000 \
--model_dtype "fp16"
--format 'auto_gptq,auto_round' \
--disable_eval
--output_dir "./tmp_autoround" \

```

## Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

## Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

* Intel Neural Compressor [link](https://github.com/intel/neural-compressor)

  

## Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.