brucethemoose
/

Yi-34B-200K-DARE-merge-v5-4bpw-exl2-fiction

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

brucethemoose commited on Dec 17, 2023

Commit

8f0641c

•

1 Parent(s): 13a3c14

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -9,13 +9,13 @@ pipeline_tag: text-generation
 tags:
 - text-generation-inference
 ---
-[**Nous-Capybara-34B**](https://huggingface.co/NousResearch/Nous-Capybara-34B/), [**Tess-M-v1.4**](https://huggingface.co/migtissera/Tess-34B-v1.4), [**Airoboros-3_1-yi-34b-200k**](https://huggingface.co/bhenrym14/airoboros-3_1-yi-34b-200k), [**PlatYi-34B-200K-Q**](https://huggingface.co/kyujinpy/PlatYi-34B-200k-Q-FastChat), [**Pallas-0.4**](https://huggingface.co/Mihaiii/Pallas-0.4), [**Yi-34B-200K-AEZAKMI-v2**](https://huggingface.co/adamo1139/Yi-34B-200K-AEZAKMI-v2), and a tiny bit of [**SUS-Chat-34B**](https://huggingface.co/SUSTech/SUS-Chat-34B) merged with a new, experimental implementation of "dare ties" via mergekit. See:
 See the main model card: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-merge-v5
-The merge was then quantized with exllamav2's 0.0.11 new exl2 quantization, using 300K tokens from a sci fi story, a fantasy story, and a Vicuna format chat as profiling data, at a high context size. This should results in excellent writing performance for the model size.
-This quantization can fit ~**45K Context on a 24GB GPU**.
 ***
 ## Prompt template: Orca-Vicuna
 ```

 tags:
 - text-generation-inference
 ---
+[**Nous-Capybara-34B**](https://huggingface.co/NousResearch/Nous-Capybara-34B/), [**Tess-M-v1.4**](https://huggingface.co/migtissera/Tess-34B-v1.4), [**Airoboros-3_1-yi-34b-200k**](https://huggingface.co/bhenrym14/airoboros-3_1-yi-34b-200k), [**PlatYi-34B-200K-Q**](https://huggingface.co/kyujinpy/PlatYi-34B-200k-Q-FastChat), [**Pallas-0.4**](https://huggingface.co/Mihaiii/Pallas-0.4), [**Yi-34B-200K-AEZAKMI-v2**](https://huggingface.co/adamo1139/Yi-34B-200K-AEZAKMI-v2), and a tiny bit of [**SUS-Chat-34B**](https://huggingface.co/SUSTech/SUS-Chat-34B) merged with a new, experimental implementation of "dare ties" via mergekit.
 See the main model card: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-merge-v5
+The merge was then quantized with exllamav2's 0.0.11 brand new exl2 quantization, using 300K tokens from a sci fi story, a fantasy story, and a Vicuna format chat as profiling data, at a high context size. This should results in excellent writing performance for the model size.
+This 4bpw quantization can fit ~**45K Context on a 24GB GPU** at high quality.
 ***
 ## Prompt template: Orca-Vicuna
 ```