luow-amd haoyang-amd commited on
Commit
d0eedc4
1 Parent(s): 2feaffe

Update README.md (#7)

Browse files

- Update README.md (6bb726f3f9635ab01ee3fe20d228e8cfa1c8d445)


Co-authored-by: haoyanli <haoyang-amd@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -8,7 +8,7 @@ base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
8
  - ## Introduction
9
  This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
10
  - ## Quantization Stragegy
11
- - ***Quantized Layers***: All linear layers excluding "lm_head", "*gate"
12
  - ***Weight***: FP8 symmetric per-tensor
13
  - ***Activation***: FP8 symmetric per-tensor
14
  - ***KV Cache***: FP8 symmetric per-tensor
@@ -23,16 +23,18 @@ python3 quantize_quark.py \
23
  --output_dir Mixtral-8x7B-Instruct-v0.1-FP8-KV \
24
  --quant_scheme w_fp8_a_fp8 \
25
  --kv_cache_dtype fp8 \
26
- --num_calib_data 128 \
27
- --model_export quark_safetensors
 
28
  # If model size is too large for single GPU, please use multi GPU instead.
29
  python3 quantize_quark.py \
30
  --model_dir $MODEL_DIR \
31
  --output_dir Mixtral-8x7B-Instruct-v0.1-FP8-KV \
32
  --quant_scheme w_fp8_a_fp8 \
33
  --kv_cache_dtype fp8 \
34
- --num_calib_data 128 \
35
  --model_export quark_safetensors \
 
36
  --multi_gpu
37
  ```
38
  ## Deployment
 
8
  - ## Introduction
9
  This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
10
  - ## Quantization Stragegy
11
+ - ***Quantized Layers***: All linear layers excluding "lm_head", "*.gate"
12
  - ***Weight***: FP8 symmetric per-tensor
13
  - ***Activation***: FP8 symmetric per-tensor
14
  - ***KV Cache***: FP8 symmetric per-tensor
 
23
  --output_dir Mixtral-8x7B-Instruct-v0.1-FP8-KV \
24
  --quant_scheme w_fp8_a_fp8 \
25
  --kv_cache_dtype fp8 \
26
+ --num_calib_data 128 \
27
+ --model_export quark_safetensors \
28
+ --no_weight_matrix_merge
29
  # If model size is too large for single GPU, please use multi GPU instead.
30
  python3 quantize_quark.py \
31
  --model_dir $MODEL_DIR \
32
  --output_dir Mixtral-8x7B-Instruct-v0.1-FP8-KV \
33
  --quant_scheme w_fp8_a_fp8 \
34
  --kv_cache_dtype fp8 \
35
+ --num_calib_data 128 \
36
  --model_export quark_safetensors \
37
+ --no_weight_matrix_merge \
38
  --multi_gpu
39
  ```
40
  ## Deployment