brucethemoose commited on
Commit
8f0641c
1 Parent(s): 13a3c14

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -9,13 +9,13 @@ pipeline_tag: text-generation
9
  tags:
10
  - text-generation-inference
11
  ---
12
- [**Nous-Capybara-34B**](https://huggingface.co/NousResearch/Nous-Capybara-34B/), [**Tess-M-v1.4**](https://huggingface.co/migtissera/Tess-34B-v1.4), [**Airoboros-3_1-yi-34b-200k**](https://huggingface.co/bhenrym14/airoboros-3_1-yi-34b-200k), [**PlatYi-34B-200K-Q**](https://huggingface.co/kyujinpy/PlatYi-34B-200k-Q-FastChat), [**Pallas-0.4**](https://huggingface.co/Mihaiii/Pallas-0.4), [**Yi-34B-200K-AEZAKMI-v2**](https://huggingface.co/adamo1139/Yi-34B-200K-AEZAKMI-v2), and a tiny bit of [**SUS-Chat-34B**](https://huggingface.co/SUSTech/SUS-Chat-34B) merged with a new, experimental implementation of "dare ties" via mergekit. See:
13
 
14
  See the main model card: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-merge-v5
15
 
16
- The merge was then quantized with exllamav2's 0.0.11 new exl2 quantization, using 300K tokens from a sci fi story, a fantasy story, and a Vicuna format chat as profiling data, at a high context size. This should results in excellent writing performance for the model size.
17
 
18
- This quantization can fit ~**45K Context on a 24GB GPU**.
19
  ***
20
  ## Prompt template: Orca-Vicuna
21
  ```
 
9
  tags:
10
  - text-generation-inference
11
  ---
12
+ [**Nous-Capybara-34B**](https://huggingface.co/NousResearch/Nous-Capybara-34B/), [**Tess-M-v1.4**](https://huggingface.co/migtissera/Tess-34B-v1.4), [**Airoboros-3_1-yi-34b-200k**](https://huggingface.co/bhenrym14/airoboros-3_1-yi-34b-200k), [**PlatYi-34B-200K-Q**](https://huggingface.co/kyujinpy/PlatYi-34B-200k-Q-FastChat), [**Pallas-0.4**](https://huggingface.co/Mihaiii/Pallas-0.4), [**Yi-34B-200K-AEZAKMI-v2**](https://huggingface.co/adamo1139/Yi-34B-200K-AEZAKMI-v2), and a tiny bit of [**SUS-Chat-34B**](https://huggingface.co/SUSTech/SUS-Chat-34B) merged with a new, experimental implementation of "dare ties" via mergekit.
13
 
14
  See the main model card: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-merge-v5
15
 
16
+ The merge was then quantized with exllamav2's 0.0.11 brand new exl2 quantization, using 300K tokens from a sci fi story, a fantasy story, and a Vicuna format chat as profiling data, at a high context size. This should results in excellent writing performance for the model size.
17
 
18
+ This 4bpw quantization can fit ~**45K Context on a 24GB GPU** at high quality.
19
  ***
20
  ## Prompt template: Orca-Vicuna
21
  ```