Update README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,8 @@ quantized_by: bartowski
|
|
14 |
pipeline_tag: text-generation
|
15 |
---
|
16 |
|
|
|
|
|
17 |
## Exllama v2 Quantizations of dolphin-2.6.1-mixtral-8x7b
|
18 |
|
19 |
Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.0.11">turboderp's ExLlamaV2 v0.0.11</a> for quantization.
|
@@ -24,7 +26,7 @@ Conversion was done using the default calibration dataset.
|
|
24 |
|
25 |
Default arguments used except when the bits per weight is above 6.0, at that point the lm_head layer is quantized at 8 bits per weight instead of the default 6.
|
26 |
|
27 |
-
Original model: https://huggingface.co/cognitivecomputations/dolphin-2.6.1-mixtral-8x7b
|
28 |
|
29 |
<a href="https://huggingface.co/bartowski/dolphin-2.6.1-mixtral-8x7b-exl2/tree/3_0">3.0 bits per weight</a>
|
30 |
|
|
|
14 |
pipeline_tag: text-generation
|
15 |
---
|
16 |
|
17 |
+
# Eric has pulled this model due to decreased performance, will leave the quants up but downloader beware, performance isn't what was expected
|
18 |
+
|
19 |
## Exllama v2 Quantizations of dolphin-2.6.1-mixtral-8x7b
|
20 |
|
21 |
Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.0.11">turboderp's ExLlamaV2 v0.0.11</a> for quantization.
|
|
|
26 |
|
27 |
Default arguments used except when the bits per weight is above 6.0, at that point the lm_head layer is quantized at 8 bits per weight instead of the default 6.
|
28 |
|
29 |
+
Original model: ~https://huggingface.co/cognitivecomputations/dolphin-2.6.1-mixtral-8x7b~
|
30 |
|
31 |
<a href="https://huggingface.co/bartowski/dolphin-2.6.1-mixtral-8x7b-exl2/tree/3_0">3.0 bits per weight</a>
|
32 |
|