InferenceIllusionist commited on
Commit
9712507
1 Parent(s): 4f57fea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -29,6 +29,7 @@ Quantized from Mistral-Large-Instruct-2407 123B fp16
29
  >* If you are getting a `cudaMalloc failed: out of memory` error, try passing an argument for lower context in llama.cpp, e.g. for 8k: `-c 8192`
30
  >* If you have all ampere generation or newer cards, you can use flash attention like so: `-fa`
31
  >* Provided Flash Attention is enabled you can also use quantized cache to save on VRAM e.g. for 8-bit: `-ctk q8_0 -ctv q8_0`
 
32
 
33
 
34
  Original model card can be found [here](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407)
 
29
  >* If you are getting a `cudaMalloc failed: out of memory` error, try passing an argument for lower context in llama.cpp, e.g. for 8k: `-c 8192`
30
  >* If you have all ampere generation or newer cards, you can use flash attention like so: `-fa`
31
  >* Provided Flash Attention is enabled you can also use quantized cache to save on VRAM e.g. for 8-bit: `-ctk q8_0 -ctv q8_0`
32
+ >* Files split with llama.cpp's gguf-split. No need to manually combine files - just download all files for a specific quant size and load the first file (labeled "00001-")
33
 
34
 
35
  Original model card can be found [here](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407)