File size: 5,334 Bytes
bb714aa
 
 
e944499
 
bb714aa
 
 
 
 
b1b1eb7
bb714aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3917979
bb714aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f4e1a62
bb714aa
 
 
f4e1a62
bb714aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f4e1a62
bb714aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e944499
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
datasets:
- NeelNanda/pile-10k
base_model:
- THUDM/glm-4v-9b
---


## Model Details

This model is an int4 model with group_size 128 and symmetric quantization of [THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b) generated by [intel/auto-round](https://github.com/intel/auto-round). Load the model with revision="dbb7900" to use AutoGPTQ format.
## How To Use
### INT4 Inference
```python
import torch
from PIL import Image
from auto_round import AutoRoundConfig ##must import for auto-round format
from transformers import AutoModelForCausalLM, AutoTokenizer
import requests

MODEL_PATH = "OPEA/glm-4v-9b-int4-sym-inc"
DEVICE = 'cuda'

tokenizer = AutoTokenizer.from_pretrained(
    "THUDM/glm-4v-9b",
    trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype="auto",
    trust_remote_code=True,
    device_map=DEVICE,
    ##revision="dbb7900" ##AutoGPTQ format
).to(DEVICE).eval()

content = '描述这张图片'
image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"

# Preparation for inference
image = Image.open(requests.get(image_url, stream=True).raw).convert('RGB')
inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": content}],
                                       add_generation_prompt=True, tokenize=True, return_tensors="pt",
                                       return_dict=True)  # chat mode

inputs = inputs.to(device)
gen_kwargs = {"max_length": 2500, "do_sample": False, "top_k": 1}
with torch.no_grad():
  outputs = model.generate(**inputs, **gen_kwargs)
  outputs = outputs[:, inputs['input_ids'].shape[1]:]
  print(tokenizer.decode(outputs[0]))

##INT4:
## 这是一张动画电影《彼得兔》的剧照。图片中,主角彼得兔穿着一件蓝色的外套、棕色的马甲和淡黄色的裤子,站在一条通往乡村小屋的小路上。他背后是一座石头房子,周围是绿色的草地和五颜六色的
## 花朵。远处可以看到连绵起伏的山脉。

##BF16:
## 这是一张描绘了动画角色彼得兔站在乡村小路上的图片。他穿着一件蓝色的外套、棕色的背心和卡其色的裤子,看起来非常正式。他的背后是一条蜿蜒的小路,两旁是五颜六色的花朵和绿色的草地。在小
## 路的尽头是一座古老的石屋,周围环境宁静而美丽。

image_url = "http://images.cocodataset.org/train2017/000000411975.jpg"
content = "图片中的棒球场上有多少人?"
##INT4:
## 图片中有四个人,其中两个人拿着棒球棍,一个人站着,一个人蹲着。

##BF16:
## 图片中有四个人,其中两个是弯腰捡东西的孩子,一个可能是老师或者家长,还有一个是拍摄者。

image_url = "https://intelcorp.scene7.com/is/image/intelcorp/processor-overview-framed-badge:1920-1080?wid=480&hei=270"
content = "这张图片代表哪家公司?"
##INT4:
## 这张图片代表英特尔公司,Intel Inside是英特尔公司为其处理器产品所采用的营销品牌,自1991年推出以来,这个标志已经成为高性能计算的代名词,并且出现在了众多搭载英特尔处理器的电脑上。

##BF16:
## 这张图片代表英特尔公司,Intel Inside是英特尔公司为其处理器产品所采用的营销计划。自1991年推出以来,这个计划已经成为了个人电脑性能的代名词,意味着搭载Intel Inside处理器的电脑能够提
## 供更好的性能和可靠性。英特尔公司是全球领先的半导体芯片制造商之一,总部位于美国加利福尼亚州圣克拉拉。
```



### Generate the model
Here is the sample command to reproduce the model.
```bash
pip install auto-round
auto-round-mllm \
--model THUDM/glm-4v-9b \
--device 0 \
--group_size 128 \
--bits 4 \
--iters 1000 \
--nsample 512 \
--seqlen 2048 \
--format 'auto_gptq,auto_round' \
--output_dir "./tmp_autoround"
```

## Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

## Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

- Intel Neural Compressor [link](https://github.com/intel/neural-compressor)

## Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

## Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)