Commit
·
406a7cf
1
Parent(s):
003fce4
7b -> 13b
Browse files
README.md
CHANGED
@@ -33,11 +33,11 @@ You can download the pre-quantized 4-bit weight models from LMDeploy's [model zo
|
|
33 |
|
34 |
Alternatively, you can quantize 16-bit weights to 4-bit weights following the ["4-bit Weight Quantization"](#4-bit-weight-quantization) section, and then perform inference as per the below instructions.
|
35 |
|
36 |
-
Take the 4-bit Llama-2-
|
37 |
|
38 |
```shell
|
39 |
git-lfs install
|
40 |
-
git clone https://huggingface.co/lmdeploy/llama2-chat-
|
41 |
```
|
42 |
|
43 |
As demonstrated in the command below, first convert the model's layout using `turbomind.deploy`, and then you can interact with the AI assistant in the terminal
|
@@ -47,7 +47,7 @@ As demonstrated in the command below, first convert the model's layout using `tu
|
|
47 |
## Convert the model's layout and store it in the default path, ./workspace.
|
48 |
python3 -m lmdeploy.serve.turbomind.deploy \
|
49 |
--model-name llama2 \
|
50 |
-
--model-path ./llama2-chat-
|
51 |
--model-format awq \
|
52 |
--group-size 128
|
53 |
|
@@ -104,6 +104,7 @@ LMDeploy employs AWQ algorithm for model weight quantization.
|
|
104 |
|
105 |
```shell
|
106 |
python3 -m lmdeploy.lite.apis.auto_awq \
|
|
|
107 |
--w_bits 4 \ # Bit number for weight quantization
|
108 |
--w_sym False \ # Whether to use symmetric quantization for weights
|
109 |
--w_group_size 128 \ # Group size for weight quantization statistics
|
|
|
33 |
|
34 |
Alternatively, you can quantize 16-bit weights to 4-bit weights following the ["4-bit Weight Quantization"](#4-bit-weight-quantization) section, and then perform inference as per the below instructions.
|
35 |
|
36 |
+
Take the 4-bit Llama-2-13B model from the model zoo as an example:
|
37 |
|
38 |
```shell
|
39 |
git-lfs install
|
40 |
+
git clone https://huggingface.co/lmdeploy/llama2-chat-13b-w4
|
41 |
```
|
42 |
|
43 |
As demonstrated in the command below, first convert the model's layout using `turbomind.deploy`, and then you can interact with the AI assistant in the terminal
|
|
|
47 |
## Convert the model's layout and store it in the default path, ./workspace.
|
48 |
python3 -m lmdeploy.serve.turbomind.deploy \
|
49 |
--model-name llama2 \
|
50 |
+
--model-path ./llama2-chat-13b-w4 \
|
51 |
--model-format awq \
|
52 |
--group-size 128
|
53 |
|
|
|
104 |
|
105 |
```shell
|
106 |
python3 -m lmdeploy.lite.apis.auto_awq \
|
107 |
+
--model $HF_MODEL \
|
108 |
--w_bits 4 \ # Bit number for weight quantization
|
109 |
--w_sym False \ # Whether to use symmetric quantization for weights
|
110 |
--w_group_size 128 \ # Group size for weight quantization statistics
|