brucethemoose
/

CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-HighDensity

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

brucethemoose commited on Dec 11, 2023

Commit

1c6cd68

•

1 Parent(s): 86c1644

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -68,7 +68,7 @@ Being a Yi model, try disabling the BOS token and/or running a lower temperature
 24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/)
-I recommend exl2 quantizations profiled on data similar to the desired task. It is especially sensitive to the quantization data at low bpw! I published my own quantizations on vicuuna chat + fiction writing here: [4bpw}(https://huggingface.co/brucethemoose/CaPlatTessDolXaBoros-34B-200K-exl2-4bpw-fiction) [3.1bpw](https://huggingface.co/brucethemoose/CaPlatTessDolXaBoros-34B-200K-exl2-4bpw-fiction)
 To load this in full-context backends like transformers and vllm, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM!
 ***

 24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/)
+I recommend exl2 quantizations profiled on data similar to the desired task. It is especially sensitive to the quantization data at low bpw! I published my own quantizations on vicuuna chat + fiction writing here: [4bpw](https://huggingface.co/brucethemoose/CaPlatTessDolXaBoros-34B-200K-exl2-4bpw-fiction) [3.1bpw](https://huggingface.co/brucethemoose/CaPlatTessDolXaBoros-34B-200K-exl2-4bpw-fiction)
 To load this in full-context backends like transformers and vllm, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM!
 ***