sys-lpot-val commited on
Commit
2783607
·
1 Parent(s): b162b49

upload auto_round format

Browse files
.gitattributes CHANGED
@@ -41,3 +41,5 @@ special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
41
  tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
42
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
43
  vocab.json filter=lfs diff=lfs merge=lfs -text
 
 
 
41
  tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
42
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
43
  vocab.json filter=lfs diff=lfs merge=lfs -text
44
+ model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
45
+ quantization_config.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -3,22 +3,17 @@ license: apache-2.0
3
  datasets:
4
  - NeelNanda/pile-10k
5
  ---
6
-
7
  ## Model Details
8
 
9
- This model is an int4 model with group_size 128 of [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round), auto-round is needed to run this model
10
-
11
  ## How To Use
12
 
13
- ### INT4 Inference
14
-
15
 
 
16
 
17
  ```python
18
- ##git clone https://github.com/intel/auto-round.git
19
- ##cd auto-round && pip install -vvv --no-build-isolation -e .
20
- from auto_round import AutoHfQuantizer ##must import
21
- import torch
22
  from transformers import AutoModelForCausalLM,AutoTokenizer
23
  quantized_model_dir = "OPEA/Qwen2.5-72B-Instruct-int4-inc"
24
  tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
@@ -27,6 +22,7 @@ model = AutoModelForCausalLM.from_pretrained(
27
  quantized_model_dir,
28
  torch_dtype='auto',
29
  device_map="auto",
 
30
  )
31
 
32
  ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
@@ -48,7 +44,7 @@ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
48
 
49
  generated_ids = model.generate(
50
  model_inputs.input_ids,
51
- max_new_tokens=50, ##change this to align with the official usage
52
  do_sample=False ##change this to align with the official usage
53
  )
54
  generated_ids = [
@@ -58,75 +54,131 @@ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_
58
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
59
  print(response)
60
 
61
- ##prompt = "There is a girl who likes adventure,
62
- ##That sounds like a wonderful trait! A girl who enjoys adventure likely has a spirit of curiosity, bravery, and a willingness to explore the unknown. Whether it's trying new activities, traveling to new places, or simply seeking out new experiences, her love
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
- ##prompt = "Which one is bigger, 9.11 or 9.8"
65
- ##To determine which number is bigger between 9.11 and 9.8, you can compare them digit by digit:
66
- ##1. Compare the whole number parts:
67
- ## - Both numbers have the same whole number part, which is 9.
68
 
 
 
 
69
 
70
- ##prompt = "Once upon a time,"
71
- ##Once upon a time, in a land far, far away, there was a kingdom known for its beauty and prosperity. The kingdom was ruled by a wise and just king who loved his people dearly. In the heart of the kingdom stood a magnificent castle
72
 
73
- ##prompt = "请介绍一下阿里巴巴公司"
74
- ##阿里巴巴集团是一家中国跨国科技公司,成立于1999年,总部位于中国杭州。阿里巴巴的业务涵盖了电子商务、零售、金融、物流、云计算等多个领域,旗下拥有包括淘宝网、天猫、菜鸟网络、阿里云等
75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
  ```
77
 
78
- ### Evaluate the model
79
 
80
- pip3 install lm-eval==0.4.4
81
 
82
  ```bash
83
- git clone https://github.com/intel/auto-round
84
- cd auto-round
85
- python -m auto_round --model "OPEA/Qwen2.5-72B-Instruct-int4-inc" --eval --eval_bs 16 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k,cmmlu,ceval-valid
86
  ```
87
 
88
- | Metric | BF16 | INT4 |
89
- |:-------------- | :----: | :----: |
90
- | Avg | 0.7582 | 0.7567 |
91
- | mmlu | 0.8336 | 0.8306 |
92
- | cmmlu | 0.8722 | 0.8638 |
93
- | ceval-valid | 0.8982 | 0.8938 |
94
- | lambada_openai | 0.7518 | 0.7603 |
95
- | hellaswag | 0.7040 | 0.6970 |
96
- | winogrande | 0.7577 | 0.7695 |
97
- | piqa | 0.8335 | 0.8270 |
98
- | truthfulqa_mc1 | 0.5288 | 0.5202 |
99
- | openbookqa | 0.3860 | 0.3900 |
100
- | boolq | 0.9046 | 0.9080 |
101
- | arc_easy | 0.8603 | 0.8577 |
102
- | arc_challenge | 0.6169 | 0.6109 |
103
- | gsm8k 5 shots | 0.9090 | 0.9083
 
 
 
104
 
105
 
106
 
 
107
 
108
- ### Reproduce the model
109
-
110
- Here is the sample command to reproduce the model. We observed a larger accuracy drop in Chinese tasks and recommend using a high-quality Chinese dataset for calibration. However, we did not achieve better accuracy with some public datasets.
111
 
112
  ```bash
113
- git clone https://github.com/intel/auto-round
114
- cd auto-round
115
- python -m auto_round \
116
- --model_name Qwen/Qwen2.5-72B-Instruct \
117
  --device 0 \
118
  --group_size 128 \
119
  --nsamples 512 \
120
  --bits 4 \
121
  --iter 1000 \
122
  --disable_eval \
123
- --model_dtype "float16" \
124
- --format 'auto_round' \
125
  --output_dir "./tmp_autoround"
126
  ```
127
 
128
-
129
-
130
  ## Ethical Considerations and Limitations
131
 
132
  The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
@@ -139,15 +191,12 @@ Users (both direct and downstream) should be made aware of the risks, biases and
139
 
140
  Here are a couple of useful links to learn more about Intel's AI software:
141
 
142
- * Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
143
- * Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)
144
 
145
  ## Disclaimer
146
 
147
  The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
148
 
149
-
150
-
151
  ## Cite
152
 
153
  @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
 
3
  datasets:
4
  - NeelNanda/pile-10k
5
  ---
 
6
  ## Model Details
7
 
8
+ This model is an int4 model with group_size 128 and and symmetric quantization of [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round). Load the model with `revision="b162b49"` to use AutoGPTQ format.
 
9
  ## How To Use
10
 
11
+ ### INT4 Inference(CPU/HPU/CUDA)
 
12
 
13
+ CPU requires auto-round version>0.3.1
14
 
15
  ```python
16
+ from auto_round import AutoRoundConfig ##must import for auto-round format
 
 
 
17
  from transformers import AutoModelForCausalLM,AutoTokenizer
18
  quantized_model_dir = "OPEA/Qwen2.5-72B-Instruct-int4-inc"
19
  tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
 
22
  quantized_model_dir,
23
  torch_dtype='auto',
24
  device_map="auto",
25
+ ##revision="b162b49" ##AutoGPTQ format
26
  )
27
 
28
  ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
 
44
 
45
  generated_ids = model.generate(
46
  model_inputs.input_ids,
47
+ max_new_tokens=200, ##change this to align with the official usage
48
  do_sample=False ##change this to align with the official usage
49
  )
50
  generated_ids = [
 
54
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
55
  print(response)
56
 
57
+ prompt = "There is a girl who likes adventure,"
58
+ ##INT4:
59
+ """That's great to hear! A love for adventure can lead to so many exciting experiences and personal growth. What kind of adventures does she enjoy? Is it exploring new places, trying out thrilling activities, or perhaps diving into new hobbies and challenges? Knowing more about her interests could help in suggesting fun ideas or planning future adventures.
60
+ """
61
+
62
+ ##BF16:
63
+ """That sounds like a great quality to have! A love for adventure can lead to exciting experiences and personal growth. What kind of adventures does she enjoy? Does she like exploring new places, trying new activities, or seeking out thrilling experiences? Knowing more about her interests can help suggest specific ideas or activities that might appeal to her adventurous spirit.
64
+ """
65
+
66
+ prompt = "9.11和9.8哪个数字大"
67
+ #INT4:
68
+ """要比较9.11和9.8的大小,可以按照以下步骤进行:
69
+
70
+ 1. **比较整数部分**:两个数字的整数部分都是9,所以需要进一步比较小数部分。
71
+ 2. **比较小数部分**:
72
+ - 9.11的小数部分是0.11
73
+ - 9.8的小数部分是0.8
74
+
75
+ 3. **比较0.11和0.8**:
76
+ - 0.11可以写成0.110
77
+ - 0.8可以写成0.800
78
+
79
+ 4. **逐位比较**:
80
+ - 第一位:1 < 8,所以0.110 < 0.800
81
+
82
+ 因此,9.11 < 9.8。
83
+
84
+ 结论:9.8比9.11大。"""
85
+
86
+ ##BF16:
87
+ """比较两个数字 9.11 和 9.8,可以按照以下步骤进行:
88
+
89
+ 1. **整数部分**:两个数字的整数部分都是 9,所以需要比较小数部分。
90
+ 2. **小数部分**:
91
+ - 9.11 的小数部分是 0.11
92
+ - 9.8 的小数部分是 0.8
93
+
94
+ 3. **比较小数部分**:
95
+ - 0.11 和 0.8 比较时,0.8 明显大于 0.11。
96
+
97
+ 因此,9.8 大于 9.11。"""
98
 
 
 
 
 
99
 
100
+ prompt = "Once upon a time,"
101
+ ##INT4:
102
+ """Once upon a time, in a far-off land, there was a kingdom filled with wonder and magic. The kingdom was ruled by a wise and just king who loved his people dearly. In the heart of the kingdom stood a magnificent castle, surrounded by lush forests and rolling hills.
103
 
104
+ The people of the kingdom lived happily, tending to their farms, crafting beautiful goods, and enjoying the simple pleasures of life. However, one day, a great darkness began to spread across the land. A wicked sorcerer had risen from the shadows, seeking to claim the throne for himself and plunge the kingdom into chaos.
 
105
 
106
+ The king, knowing that he could not face this threat alone, called upon the bravest and most skilled heroes from all corners of the realm. Among them was a young knight named Sir Cedric, who had earned a reputation for his courage and unwavering sense of justice.
 
107
 
108
+ Sir Cedric, along with a group of loyal companions, set out on a perilous journey to stop the sor"""
109
+
110
+ ##BF16:
111
+ """Once upon a time, in a land far, far away, there was a kingdom known for its beauty and prosperity. The kingdom was ruled by a wise and just king who loved his people dearly. In the heart of the kingdom stood a magnificent castle, surrounded by lush gardens and sparkling fountains.
112
+
113
+ The king had a young daughter named Princess Elara, who was as kind and gentle as she was beautiful. She spent her days helping the poor and spreading joy throughout the kingdom. The people adored her, and she was beloved by all.
114
+
115
+ One day, a great challenge arose. A dark forest on the outskirts of the kingdom began to grow wild and dangerous, threatening the safety of the villagers. The king called for a hero to tame the forest and protect his people. Many brave knights and warriors came forward, but none could succeed.
116
+
117
+ Princess Elara, determined to help, decided to venture into the forest herself. Her father was hesitant, but he saw the determination in her eyes and knew"""
118
+
119
+
120
+ prompt = "请简短介绍一下阿里巴巴公司"
121
+ ##INT4:
122
+ """阿里巴巴集团是一家总部位于中国杭州的全球领先的电子商务和科技公司。它成立于1999年,由马云和他的团队创立。阿里巴巴旗下拥有包括淘宝、天猫、阿里云等在内的多个知名业务平台,涵盖了在线零售、批发贸易、云计算、数字娱乐、金融服务等多个领域。
123
+
124
+ 阿里巴巴的愿景是让世界各地的企业都能够平等地进行贸易,通过技术创新推动数字经济的发展,为社会创造更多的价值。目前,阿里巴巴已经发展成为世界领先的互联网公司之一,业务遍布全球多个国家和地区,服务着数以亿计的用户和商家。"""
125
+
126
+ ##BF16:
127
+ """阿里巴巴集团是一家总部位于中国杭州的全球领先的电子商务和科技公司。成立于1999年,阿里巴巴最初是一个B2B在线市场,旨在连接中国制造商与全球买家。经过二十多年的发展,阿里巴巴已经发展成为涵盖电子商务、金融、物流、云计算等多个领域的综合性企业集团。
128
+
129
+ 阿里巴巴旗下拥有淘宝网、天猫、菜鸟网络、阿里云等知名品牌,为消费者提供购物、支付、娱乐等多元化服务,同时也为企业提供营销、销售、物流和技术支持等全方位解决方案。此外,阿里巴巴还积极投资和孵化创新项目,推动数字经济的发展。
130
+
131
+ 阿里巴巴始终秉持“让天下没有难做的生意”的使命,致力于通过技术创新促进全球经济的可持续发展。"""
132
  ```
133
 
134
+ ### Evaluate the model
135
 
136
+ pip3 install lm-eval==0.4.5
137
 
138
  ```bash
139
+ auto-round --model "OPEA/Qwen2.5-72B-Instruct-int4-inc" --eval --eval_bs 16 --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid
 
 
140
  ```
141
 
142
+ | Metric | BF16 | INT4 |
143
+ | :----------------------------------------- | :----: | :----: |
144
+ | Avg | 0.7413 | 0.7448 |
145
+ | leaderboard_mmlu_pro 5 shots | 0.5919 | 0.5864 |
146
+ | leaderboard_ifeval inst_level_strict_acc | 0.7770 | 0.7866 |
147
+ | leaderboard_ifeval prompt_level_strict_acc | 0.6858 | 0.6932 |
148
+ | mmlu | 0.8334 | 0.8308 |
149
+ | cmmlu | 0.8727 | 0.8673 |
150
+ | ceval-valid | 0.8975 | 0.8960 |
151
+ | gsm8k 5 shots | 0.9037 | 0.9098 |
152
+ | lambada_openai | 0.7518 | 0.7563 |
153
+ | hellaswag | 0.7031 | 0.7014 |
154
+ | winogrande | 0.7601 | 0.7687 |
155
+ | piqa | 0.8313 | 0.8232 |
156
+ | truthfulqa_mc1 | 0.5239 | 0.5263 |
157
+ | openbookqa | 0.3860 | 0.3820 |
158
+ | boolq | 0.9049 | 0.9046 |
159
+ | arc_easy | 0.8632 | 0.8611 |
160
+ | arc_challenge | 0.6135 | 0.6237 |
161
 
162
 
163
 
164
+ ### Generate the model
165
 
166
+ Here is the sample command to generate the model.
 
 
167
 
168
  ```bash
169
+ auto-round \
170
+ --model Qwen/Qwen2.5-72B-Instruct \
 
 
171
  --device 0 \
172
  --group_size 128 \
173
  --nsamples 512 \
174
  --bits 4 \
175
  --iter 1000 \
176
  --disable_eval \
177
+ --model_dtype "fp16" \
178
+ --format 'auto_gptq,auto_round' \
179
  --output_dir "./tmp_autoround"
180
  ```
181
 
 
 
182
  ## Ethical Considerations and Limitations
183
 
184
  The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
 
191
 
192
  Here are a couple of useful links to learn more about Intel's AI software:
193
 
194
+ - Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
 
195
 
196
  ## Disclaimer
197
 
198
  The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
199
 
 
 
200
  ## Cite
201
 
202
  @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
config.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5cf5ce83e29709fbad13150e1f42418d71ae6fdd05d544339df913887681ed7c
3
- size 1374
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:577c1aac4c764781fe8a97c6627a7d8b9c870215d3dc373884852d17daa9e859
3
+ size 1388
model-00001-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dddf3fdd44beb20160150391fc689f51ae51e68e3da93589227ad07a3c637225
3
+ size 4977604760
model-00002-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9996f62a04433689c7fd942cd7f9c2d8baba00400952024b732d4bf9e7c9b32
3
+ size 4893894648
model-00003-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4b9ef75339a7c23bdc245573be342868ae5e5cc9ba1fcc32c3f11890fe28631
3
+ size 4984871048
model-00004-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c2ed066ed1aca1c6c72a1982c20c4a4f2a8d60876598fb5fcd9648f7e03b6a4
3
+ size 4976067496
model-00005-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02d27985f48d8dc38c5a492f922936d4cb64a897dfc32641645235a720eebf59
3
+ size 4893776280
model-00006-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d4481ce2269fbe94b44799737e9240ab70e38db8fcda3183ae7c2df0597b17e
3
+ size 4893894800
model-00007-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0cf75b65e52d3f3a9367f47f2bd03e0283f0ce82c4f0cf443eb8a6c916ece8f5
3
+ size 4893894808
model-00008-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea3bb9b89b92f3d738b556c36afb0ffd29662a034608fcf7ac73f48474c462ce
3
+ size 4484842920
model-00009-of-00009.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:220efb428e7b47b14df2fd0e7afb1da052f6d9a1f0b5ea7beeb440dc8247f949
3
+ size 2491416704
model.safetensors.index.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:042c113fd1f8f39103d5bcf865e4eacc3287a136416fab30c7b06185fde4ec12
3
+ size 215614
quantization_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88102526dda3055b482da613fdd3bd449c2877728125c5f4fae365afdab6b59f
3
+ size 574