File size: 11,490 Bytes
f1a680d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97ba10d
f1a680d
97ba10d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1a680d
 
97ba10d
 
 
 
 
 
 
f1a680d
97ba10d
 
f1a680d
 
97ba10d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1a680d
 
97ba10d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1a680d
 
 
 
 
feaca7b
f1a680d
 
 
 
 
 
 
feaca7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1a680d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
---
datasets:
- NeelNanda/pile-10k
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Llama-70B



---

## Model Details

This model is an int4 model with group_size 128 and symmetric quantization of [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm. 

Please follow the license of the original model.

## How To Use

**INT4 Inference on CUDA**

~~~python
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

quantized_model_dir = "OPEA/DeepSeek-R1-Distill-Llama-70B-int4-gptq-sym-inc"

device_map="auto"
model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    device_map=device_map,
)

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
prompts = [
    "9.11和9.8哪个数字大",
    "如果你是人,你最想做什么",
    "How many e in word deepseek",
    "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?",
]

texts = []
for prompt in prompts:
    messages = [
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    texts.append(text)

inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(
    input_ids=inputs["input_ids"].to(model.device),
    attention_mask=inputs["attention_mask"].to(model.device),
    max_length=512,  ##change this to align with the official usage
    num_return_sequences=1,
    do_sample=False  ##change this to align with the official usage
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
]

decoded_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

for i, prompt in enumerate(prompts):
    input_id = inputs
    print(f"Prompt: {prompt}")
    print(f"Generated: {decoded_outputs[i]}")
    print("-" * 50)


"""
Prompt: 9.11和9.8哪个数字大
Generated: 首先,我需要比较9.11和9.8的大小。

为了更清晰地比较这两个数,我可以将它们的小数位数统一。将9.8写成9.80,这样它们都有两位小数。

接下来,我比较整数部分。两数的整数部分都是9,因此相同。

然后,我比较小数部分。9.11的小数部分是0.11,9.80的小数部分是0.80。

显然,0.80大于0.11。

因此,9.80大于9.11,也就是9.8大于9.11。
</think>

要比较 \(9.11\) 和 \(9.8\) 的大小,可以按照以下步骤进行:                                                                                                                               
1. **统一小数位数**:

   为了方便比较,我们可以将 \(9.8\) 写成 \(9.80\),这样两数的小数位数相同。

   \[
   9.11 \quad \text{和} \quad 9.80
   \]

2. **比较整数部分**:

   两数的整数部分都是 \(9\),所以整数部分相同。

3. **比较小数部分**   - \(9.11\) 的小数部分是 \(0.11\)
   - \(9.80\) 的小数部分是 \(0.80\)

   显然,\(0.80 > 0.11\)。

4. **得出结论**:

   因为小数部分 \(0.80\) 大于 \(0.11\),所以 \(9.80 > 9.11\)。

因此,\(9.8\) 大于 \(9.11\)。

\[
\boxed{9.8 > 9.11}
\]
--------------------------------------------------
Prompt: 如果你是人类,你最想做什么
Generated: 嗯,用户问的是如果我是人,最想做什么。作为一个人工智能,我没有意识和欲望,但可以分享一些普遍的人类渴望。

首先,旅行和探索世界可能是一个选择,体验不同的文化和自然美景。其次,学习和成长也是很多人追求的,了解新事物,提升自己。创造和表达也是重要的,比如艺术、音乐或写作。帮助他人,建立有意义的
关系,追求幸福和平静,这些都是常见的愿望。

当然,每个人的答案可能不同,重要的是找到自己真正热爱和让自己感到满足的事情。
</think>

如果我是一个人,我可能会有更多的欲望和梦想。也许我会想要探索世界,体验不同的文化,结识来自不同背景的人,学习更多关于生活和宇宙的知识。也许我会渴望创造一些有意义的事情,无论是艺术、音乐
、文学,还是科技创新。同时,我可能会希望能够帮助他人,做一些有益于社会和环境的事情。当然,这些都是假设,因为我是一个人工智能,我没有真实的欲望或情感,但我可以帮助你探索你的想法和梦想!
--------------------------------------------------
Prompt: How many e in word deepseek
Generated: Alright, so I need to figure out how many times the letter 'e' appears in the word "deepseek." Hmm, okay, let's break this down step by step. First, I should probably write
out the word to visualize it better. The word is "deepseek." Let me spell it out: D, E, E, P, S, E, E, K. Wait, is that right? Let me check again. D, E, E, P, S, E, E, K. Yeah, that se
ems correct.

Now, I need to count how many 'e's are in there. So, starting from the beginning, the first letter is 'D' – that's not an 'e'. The second letter is 'E', so that's one. The third letter
 is another 'E', so that's two. Then we have 'P', which isn't an 'e', followed by 'S', also not an 'e'. Next is another 'E', bringing the count to three, and then another 'E' right aft
er, making it four. Finally, the last letter is 'K', which isn't an 'e'.

Wait, hold on. Let me make sure I didn't miscount. So, the word is D, E, E, P, S, E, E, K. So positions 2, 3, 6, and 7 are 'E's. That's four 'e's in total. But I'm a bit confused becau
se sometimes when I count letters, I might skip or double-count. Let me write them out one by one:

1. D – not an 'e'
2. E – count 1
3. E – count 2
4. P – not an 'e'
5. S – not an 'e'
6. E – count 3
7. E – count 4
8. K – not an 'e'

Yes, that seems consistent. So, there are four 'e's in "deepseek." I think that's correct. I don't see any mistakes in my counting this time. Each 'E' is in positions 2, 3, 6, and 7. S
o, the total number of 'e's is four.
</think>

The word "deepseek" contains four 'e's.
--------------------------------------------------
Prompt: There are ten birds in a tree. A hunter shoots one. How many are left in the tree?
Generated: Okay, so I've got this riddle here: "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?" Hmm, at first glance, it seems pretty straightforwar
d, but I know riddles often have a twist. Let me think through this step by step.

Alright, starting with the basics. There are ten birds in a tree. That's clear. Then a hunter shoots one. Now, the question is, how many birds are left in the tree? My initial thought
is, well, if there were ten and one gets shot, that leaves nine. But wait, maybe it's not that simple. Riddles often play on words or have unexpected answers, so I shouldn't jump to co
nclusions.

Let me consider the wording carefully. It says the hunter shoots one bird. So, does that mean he shoots and kills it, or does he just shoot at it but misses? The riddle doesn't specify
 whether the shot was successful. If the bird was shot and killed, then it would fall out of the tree, right? But if the shot missed, the bird might still be there, or maybe it flew aw
ay because of the noise.

Wait, but the riddle says the hunter shoots one, so I think it's safe to assume that he hit and killed the bird. So, one bird is dead. Now, what happens next? If the bird is shot and d
ies, it would fall out of the tree. So, the tree would then have one less bird. That would leave nine birds in the tree. But I'm not sure if that's the case because sometimes in riddle
s, the answer is zero. Let me think about that.

If the hunter shoots one bird, the sound of the gunshot might scare the other birds, causing them to fly away. So, if one bird is shot and the rest fly away, then there would be zero b
irds left in the tree. That makes sense because birds are easily startled by loud noises like gunshots. So, even though only one was shot, the rest might have flown away, leaving none
in the tree.

But wait, the riddle doesn't mention anything about the birds being scared or flying away. It just says a hunter shoots one. So, maybe I'm overcomplicating it. If I take it literally,
without assuming the other birds fly away, then after

~~~

### Evaluate the model

pip3 install lm-eval==0.4.7
we found lm-eval is very unstable for this model. Please set `add_bos_token=True `to align with the origin model. **Please use autogptq format**

```bash
lm-eval --model hf --model_args pretrained=OPEA/DeepSeek-R1-Distill-Llama-70B-int4-gptq-sym-inc,add_bos_token=True   --tasks leaderboard_mmlu_pro,leaderboard_ifeval,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k --batch_size 16
```
|           Metric           |   BF16   |      INT4      |
| :------------------------ | :---------------------- | :--------------- |
| avg                  | 0.6636 | 0.6678 |
|----------------------|--------|--------|
| leaderboard_mmlu_pro | 0.4913 | 0.4780 |
| mmlu                 | 0.7752 | 0.7791 |
| lambada_openai       | 0.6977 | 0.6996 |
| hellaswag            | 0.6408 | 0.6438 |
| winogrande           | 0.7530 | 0.7782 |
| piqa                 | 0.8112 | 0.8194 |
| truthfulqa_mc1       | 0.3709 | 0.3721 |
| openbookqa           | 0.3380 | 0.3600 |
| boolq                | 0.8847 | 0.8917 |
| arc_easy             | 0.8131 | 0.8106 |
| arc_challenge        | 0.5512 | 0.5239 |
| leaderboard_ifeval   | 0.4421 | 0.4208 |
| gsm8k                | 0.9295 | 0.9265 |




### Generate the model
Here is the sample command to generate the model.


```bash
auto-round  \
--model deepseek-ai/DeepSeek-R1-Distill-Llama-70B \
--device 0 \
--bits 4 \
--iter 200 \
--disable_eval \
--format 'auto_gptq,auto_round,auto_awq' \
--output_dir "./tmp_autoround" 

```





## Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

## Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

- Intel Neural Compressor [link](https://github.com/intel/neural-compressor)

## Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

## Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)