CED6688
/

magnum-v4-72b-AWQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

CED6688 commited on Oct 25, 2024

Commit

a320ab0

·

verified ·

1 Parent(s): 0239597

Update README.md

Files changed (1) hide show

README.md +6 -2

README.md CHANGED Viewed

@@ -2,14 +2,18 @@
 license: apache-2.0
 language:
 - en
 tags:
 - chat
 pipeline_tag: text-generation
 library_name: transformers
 ---
 ## AWQ Quantization Note
-My favorite model is Qwen2.5-72B-Instruct, but it responds a little dry sometimes, so I tried this model to see if it provided better response. Unfortunately, it doesn't perform as well for my primary RAG/tools use cases that require more stricter adherance to previous context.
 Qwen2.5-72B and derived models have an extra padding step required to quantize to AWQ in a way that supports tensor parallelism with vLLM and other services, so in the event that others find this model suitable for their needs, I'm uploading my AWQ 4-bit quant which first follows the paddings step at the bottom of [this page](https://qwen.readthedocs.io/en/latest/quantization/gptq.html)
@@ -208,4 +212,4 @@ We used  8x mi300x GPUs graciously provided by [DoctorShotgun](https://huggingfa
 [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
 ## Safety
-...

 license: apache-2.0
 language:
 - en
+- zh
 tags:
 - chat
 pipeline_tag: text-generation
 library_name: transformers
+base_model:
+- Qwen/Qwen2.5-72B-Instruct
+- anthracite-org/magnum-v4-72b
 ---
 ## AWQ Quantization Note
+My favorite model is Qwen2.5-72B-Instruct, but it responds a little dry sometimes, so I tried this model to see if it provided better response. Unfortunately, it doesn't perform as well for my primary RAG/tools use cases that require stricter adherance to previous context.
 Qwen2.5-72B and derived models have an extra padding step required to quantize to AWQ in a way that supports tensor parallelism with vLLM and other services, so in the event that others find this model suitable for their needs, I'm uploading my AWQ 4-bit quant which first follows the paddings step at the bottom of [this page](https://qwen.readthedocs.io/en/latest/quantization/gptq.html)
 [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
 ## Safety
+...