CED6688 commited on
Commit
a320ab0
·
verified ·
1 Parent(s): 0239597

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -2,14 +2,18 @@
2
  license: apache-2.0
3
  language:
4
  - en
 
5
  tags:
6
  - chat
7
  pipeline_tag: text-generation
8
  library_name: transformers
 
 
 
9
  ---
10
 
11
  ## AWQ Quantization Note
12
- My favorite model is Qwen2.5-72B-Instruct, but it responds a little dry sometimes, so I tried this model to see if it provided better response. Unfortunately, it doesn't perform as well for my primary RAG/tools use cases that require more stricter adherance to previous context.
13
 
14
  Qwen2.5-72B and derived models have an extra padding step required to quantize to AWQ in a way that supports tensor parallelism with vLLM and other services, so in the event that others find this model suitable for their needs, I'm uploading my AWQ 4-bit quant which first follows the paddings step at the bottom of [this page](https://qwen.readthedocs.io/en/latest/quantization/gptq.html)
15
 
@@ -208,4 +212,4 @@ We used 8x mi300x GPUs graciously provided by [DoctorShotgun](https://huggingfa
208
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
209
 
210
  ## Safety
211
- ...
 
2
  license: apache-2.0
3
  language:
4
  - en
5
+ - zh
6
  tags:
7
  - chat
8
  pipeline_tag: text-generation
9
  library_name: transformers
10
+ base_model:
11
+ - Qwen/Qwen2.5-72B-Instruct
12
+ - anthracite-org/magnum-v4-72b
13
  ---
14
 
15
  ## AWQ Quantization Note
16
+ My favorite model is Qwen2.5-72B-Instruct, but it responds a little dry sometimes, so I tried this model to see if it provided better response. Unfortunately, it doesn't perform as well for my primary RAG/tools use cases that require stricter adherance to previous context.
17
 
18
  Qwen2.5-72B and derived models have an extra padding step required to quantize to AWQ in a way that supports tensor parallelism with vLLM and other services, so in the event that others find this model suitable for their needs, I'm uploading my AWQ 4-bit quant which first follows the paddings step at the bottom of [this page](https://qwen.readthedocs.io/en/latest/quantization/gptq.html)
19
 
 
212
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
213
 
214
  ## Safety
215
+ ...