pkumc yuanzu commited on
Commit
e48a514
·
verified ·
1 Parent(s): 160722a

Update README.md (#6)

Browse files

- Update README.md (47346e4d832f26979f43a25b069688934c3f4cc8)


Co-authored-by: laixinn <yuanzu@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -9,15 +9,15 @@ The INT8 data type is both friendly and efficient for most hardware platforms.
9
 
10
  **We provide a block-wise INT8 weight for DeepSeek-R1.**
11
 
12
- In benchmarking, we observe **no accuracy loss** and up to **30\%** performance enhancement.
13
 
14
  [SGLang](https://github.com/sgl-project/sglang/tree/main) will soon support the block-wise INT8 quantization operation once our [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730) is merged.
15
 
16
  ## 1. Benchmarking Result (detailed in [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730)):
17
- | Model | Config | Accuracy (GSM8K) | Accuracy (MMLU) | Output Throughput(qps=128) | Output Throughput(bs=1) |
18
- |--------|--------|-------------------|----------------|------------------------------|--------------------------|
19
- | BF16 R1 | A100\*32 | 95.5 | 87.1 | 3342.29 | 37.20 |
20
- | INT8 R1 | (A100\*16)x2 | **95.8** | **87.1** | 4450.02 **(+33%)** | 44.18 **(+18%)** |
21
 
22
  ## 2. Quantization Process
23
 
 
9
 
10
  **We provide a block-wise INT8 weight for DeepSeek-R1.**
11
 
12
+ In benchmarking, we observe **no accuracy loss** and up to **33\%** performance enhancement.
13
 
14
  [SGLang](https://github.com/sgl-project/sglang/tree/main) will soon support the block-wise INT8 quantization operation once our [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730) is merged.
15
 
16
  ## 1. Benchmarking Result (detailed in [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730)):
17
+ | Model | Config | Accuracy (GSM8K) | Accuracy (MMLU) | Output Throughput(qps=128) |
18
+ |--------|--------|-------------------|----------------|------------------------------|
19
+ | BF16 R1 | A100\*32 | 95.5 | 87.1 | 3342.29 |
20
+ | INT8 R1 | (A100\*16)x2 | **95.8** | **87.1** | 4450.02 **(+33%)** |
21
 
22
  ## 2. Quantization Process
23