JosephusCheung
commited on
Commit
•
a8ba114
1
Parent(s):
1a68e27
Update README.md
Browse files
README.md
CHANGED
@@ -11,6 +11,8 @@ There are some issues with the model weights in terms of precision. In the next
|
|
11 |
**Please note:** Do not use "accelerated inference frameworks" like **VLLM** temporarily. Instead, use Transformers for inference. Otherwise, due to precision issues, the output quality will be significantly degraded. If you need faster inference, you can consider using the q8_0 quantization (faster and better than bf16 vllm for this model only) with llama.cpp temporarily or wait for the official version.
|
12 |
To be fixed in the upcoming next version update.
|
13 |
|
|
|
|
|
14 |
Please do not use wikitext for quantization calibration because all wikitext have been re-aligned on synthetic dataset, and its distribution differs significantly from the original wikitext.
|
15 |
|
16 |
## MT-Bench: 8.5
|
|
|
11 |
**Please note:** Do not use "accelerated inference frameworks" like **VLLM** temporarily. Instead, use Transformers for inference. Otherwise, due to precision issues, the output quality will be significantly degraded. If you need faster inference, you can consider using the q8_0 quantization (faster and better than bf16 vllm for this model only) with llama.cpp temporarily or wait for the official version.
|
12 |
To be fixed in the upcoming next version update.
|
13 |
|
14 |
+
**no repetition_penalty!**
|
15 |
+
|
16 |
Please do not use wikitext for quantization calibration because all wikitext have been re-aligned on synthetic dataset, and its distribution differs significantly from the original wikitext.
|
17 |
|
18 |
## MT-Bench: 8.5
|