alpindale
/

magnum-v4-123b-hqq-4bit

8-bit precision

Model card Files Files and versions Community

alpindale commited on 16 days ago

Commit

5f45cb2

•

1 Parent(s): 09e8eb3

Update README.md

Files changed (1) hide show

README.md +33 -1

README.md CHANGED Viewed

@@ -1,4 +1,26 @@
-magnum-v4-123b quantized to 4-bit precision using [HQQ](https://github.com/mobiusml/hqq/).
 HQQ provides a similar level of precision to AWQ at 4-bit, but with no need for calibration.
@@ -23,4 +45,14 @@ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
 output_path = "magnum-v4-123b-hqq-4bit"
 model.save_pretrained(output_path)
 tokenizer.save_pretrained(output_path)
 ```

+---
+language:
+- en
+- fr
+- de
+- es
+- it
+- pt
+- zh
+- ja
+- ru
+- ko
+license: other
+license_name: mrl
+inference: false
+license_link: https://mistral.ai/licenses/MRL-0.1.md
+base_model:
+- anthracite-org/magnum-v4-123b
+---
+# Magnum-v4-123b HQQ
+This repo contains magnum-v4-123b quantized to 4-bit precision using [HQQ](https://github.com/mobiusml/hqq/).
 HQQ provides a similar level of precision to AWQ at 4-bit, but with no need for calibration.
 output_path = "magnum-v4-123b-hqq-4bit"
 model.save_pretrained(output_path)
 tokenizer.save_pretrained(output_path)
+```
+## Inference
+You can perform inference directly with transformers, or using [aphrodite](https://github.com/PygmalionAI/aphrodite-engine):
+```sh
+pip install aphrodite-engine
+aphrodite run alpindale/magnum-v4-123b-hqq-4bit -tp 2
 ```