Update README.md
Browse files
README.md
CHANGED
@@ -2,14 +2,18 @@
|
|
2 |
license: apache-2.0
|
3 |
language:
|
4 |
- en
|
|
|
5 |
tags:
|
6 |
- chat
|
7 |
pipeline_tag: text-generation
|
8 |
library_name: transformers
|
|
|
|
|
|
|
9 |
---
|
10 |
|
11 |
## AWQ Quantization Note
|
12 |
-
My favorite model is Qwen2.5-72B-Instruct, but it responds a little dry sometimes, so I tried this model to see if it provided better response. Unfortunately, it doesn't perform as well for my primary RAG/tools use cases that require
|
13 |
|
14 |
Qwen2.5-72B and derived models have an extra padding step required to quantize to AWQ in a way that supports tensor parallelism with vLLM and other services, so in the event that others find this model suitable for their needs, I'm uploading my AWQ 4-bit quant which first follows the paddings step at the bottom of [this page](https://qwen.readthedocs.io/en/latest/quantization/gptq.html)
|
15 |
|
@@ -208,4 +212,4 @@ We used 8x mi300x GPUs graciously provided by [DoctorShotgun](https://huggingfa
|
|
208 |
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
209 |
|
210 |
## Safety
|
211 |
-
...
|
|
|
2 |
license: apache-2.0
|
3 |
language:
|
4 |
- en
|
5 |
+
- zh
|
6 |
tags:
|
7 |
- chat
|
8 |
pipeline_tag: text-generation
|
9 |
library_name: transformers
|
10 |
+
base_model:
|
11 |
+
- Qwen/Qwen2.5-72B-Instruct
|
12 |
+
- anthracite-org/magnum-v4-72b
|
13 |
---
|
14 |
|
15 |
## AWQ Quantization Note
|
16 |
+
My favorite model is Qwen2.5-72B-Instruct, but it responds a little dry sometimes, so I tried this model to see if it provided better response. Unfortunately, it doesn't perform as well for my primary RAG/tools use cases that require stricter adherance to previous context.
|
17 |
|
18 |
Qwen2.5-72B and derived models have an extra padding step required to quantize to AWQ in a way that supports tensor parallelism with vLLM and other services, so in the event that others find this model suitable for their needs, I'm uploading my AWQ 4-bit quant which first follows the paddings step at the bottom of [this page](https://qwen.readthedocs.io/en/latest/quantization/gptq.html)
|
19 |
|
|
|
212 |
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
213 |
|
214 |
## Safety
|
215 |
+
...
|