license: apache-2.0
datasets:
- NeelNanda/pile-10k
base_model:
- Qwen/Qwen2.5-0.5B-Instruct
Model Details
This model is an int4 model with group_size 128 and symmetric quantization of Qwen/Qwen2.5-0.5B-Instruct generated by intel/auto-round. Load the model with revision="7cac2d1"
to use AutoGPTQ format
How To Use
INT4 Inference(CPU/HPU/CUDA)
CPU requires auto-round version>0.3.1
from auto_round import AutoRoundConfig ##must import for auto-round format
from transformers import AutoModelForCausalLM,AutoTokenizer
quantized_model_dir = "OPEA/Qwen2.5-0.5B-Instruct-int4-inc"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
model = AutoModelForCausalLM.from_pretrained(
quantized_model_dir,
torch_dtype='auto',
device_map="auto",
##revision="7cac2d1" ##AutoGPTQ format
)
##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
##import habana_frameworks.torch.hpu as hthpu ## uncommnet it for HPU
##model = model.to(torch.bfloat16).to("hpu") ## uncommnet it for HPU
prompt = "There is a girl who likes adventure,"
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=200, ##change this to align with the official usage
do_sample=False ##change this to align with the official usage
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
prompt = "There is a girl who likes adventure,"
## INT4:
"""That's great to hear! What kind of adventure does the girl like? Is there anything specific she enjoys doing or exploring?"""
## BF16:
"""That's great! What kind of adventure does she like?"""
prompt = "9.11和9.8哪个数字大"
#INT4:
"""
要比较9.11和9.8的大小,我们可以按照以下步骤进行:
1. 首先,将两个数都转换为相同的小数形式。这里我们使用小数点前的零来方便比较。
9.11 = 9.1100 (保留两位小数)
9.8 = 9.8000 (保留两位小数)
2. 现在,比较这两个小数:
- 第一位:9 和 9 相等。
- 第二位:第一位是相同的,都是1。
- 第三位:第一个数是1,第二个数是8,所以8 > 1。
因此,9.8大于9.11。
最终答案:9.8更大。
"""
##BF16:
"""
要比较9.11和9.8的大小,我们可以按照以下步骤进行:
1. **直接比较**:将两个数相减:
\[
9.11 - 9.8 = -0.69
\]
2. **理解结果**:-0.69表示的是一个负数。因为9.11比9.8小。
因此,9.8比9.11大。
"""
prompt = "Once upon a time,"
##INT4:
"""I'm sorry, but I don't understand what you're asking me to do or what information you want me to provide. Could you please clarify your question or provide more context? I'd be happy to help if you can give me all the information you need."""
##BF16:
"""once upon a time, there was a young girl named Lily who lived in a small village nestled between two mountains. She had always been fascinated by the natural world and dreamed of exploring it further.
One day, while wandering through the forest, she stumbled upon an old, mysterious book that seemed to have been written on its pages. As she read, she realized that the book contained secrets about the hidden treasures of the earth.
Lily was determined to uncover these secrets and become a true explorer. She spent hours poring over the pages, trying to understand what the author was trying to tell her.
Finally, after many days of research and study, Lily discovered the location of the treasure. It lay deep within the heart of the mountain range, guarded by powerful forces that only those with the right knowledge could reach.
With great excitement, Lily set out on her journey to find the treasure. She traveled for weeks, crossing treacherous terrain and battling fierce beasts along the way. But even as she"""
prompt = "请简短介绍一下阿里巴巴公司"
##INT4:
"""阿里巴巴集团是全球领先的电子商务和云计算服务提供商,成立于1999年。该公司总部位于中国杭州,并在多个国家和地区设有办事处和运营中心。阿里巴巴集团的业务包括在线零售、移动支付、云计算、人工智能等。阿里巴巴集团是中国最大的电子商务平台之一,也是全球最大的电商平台之一。阿里巴巴集团还拥有众多子公司和品牌,如淘宝、天猫、菜鸟网络等。阿里巴巴集团在全球范围内拥有超过20亿活跃用户,每年销售额超过3500亿美元。阿里巴巴集团致力于通过创新和智能化技术推动商业变革,为消费者提供更便捷、更个性化的购物体验。"""
##BF16:
"""阿里巴巴集团是全球最大的电子商务平台之一,成立于1999年。该公司提供包括淘宝、天猫、阿里云等在内的众多产品和服务,是中国乃至全球领先的互联网企业之一。"""
Evaluate the model
pip3 install lm-eval==0.4.5
auto-round --model "OPEA/Qwen2.5-0.5B-Instruct-int4-inc" --eval --eval_bs 16 --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid
Metric | BF16 | INT4 |
---|---|---|
Avg | 0.4229 | 0.4124 |
leaderboard_mmlu_pro 5 shots | 0.1877 | 0.1678 |
leaderboard_ifeval inst_level_strict_acc | 0.3501 | 0.3441 |
leaderboard_ifeval prompt_level_strict_acc | 0.2107 | 0.2218 |
mmlu | 0.4582 | 0.4434 |
cmmlu | 0.5033 | 0.4542 |
ceval-valid | 0.5327 | 0.4918 |
gsm8k 5 shots | 0.2146 | 0.2267 |
lambada_openai | 0.4968 | 0.4692 |
hellaswag | 0.4062 | 0.3927 |
winogrande | 0.5541 | 0.5675 |
piqa | 0.7051 | 0.7035 |
truthfulqa_mc1 | 0.2693 | 0.2815 |
openbookqa | 0.2400 | 0.2200 |
boolq | 0.6783 | 0.6471 |
arc_easy | 0.6566 | 0.6595 |
arc_challenge | 0.3020 | 0.3072 |
Generate the model
Here is the sample command to generate the model. We observed a larger accuracy drop in Chinese tasks and recommend using a high-quality Chinese dataset for calibration or smaller group_size like 32.
auto-round \
--model Qwen/Qwen2.5-0.5B-Instruct \
--device 0 \
--group_size 128 \
--nsamples 512 \
--bits 4 \
--iter 1000 \
--disable_eval \
--model_dtype "fp16" \
--format 'auto_gptq,auto_round' \
--output_dir "./tmp_autoround"
Ethical Considerations and Limitations
The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
Therefore, before deploying any applications of the model, developers should perform safety testing.
Caveats and Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
Here are a couple of useful links to learn more about Intel's AI software:
- Intel Neural Compressor link
Disclaimer
The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
Cite
@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }