metadata
license: llama3
language:
- ja
- en
base_model: shisa-ai/shisa-v1-llama3-70b
See https://huggingface.co/shisa-ai/shisa-v1-llama3-70b for the original model.
I was seeing corruption issues at extended context length but this appears to be due to how llama.cpp's server
behavior defaulting to a small context window.
See: https://github.com/ggerganov/llama.cpp/issues/7609
When using the server
, you should explicitly set --ctx-size 0
or --ctx-size 8192
to support the native context size, eg:
./server -ngl 99 -m shisa-v1-llama3-70b.Q4_K_M.gguf --host 0.0.0.0 --port 8080 --chat-template llama3 --ctx-size 0
Model | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |
---|---|---|---|---|---|
shisa-ai/shisa-v1-llama3-70b | 7.30 | 7.34 | 7.67 | 8.15 | 6.04 |
shisa-ai/shisa-v1-llama3-70b.Q4_K_M | 7.22 | 7.22 | 7.27 | 8.20 | 6.19 |
For additional quants, including lower-bit iMatrix quants, see: https://huggingface.co/mradermacher/shisa-v1-llama3-70b-GGUF
split big files:
split -b 40G -d --additional-suffix=.part shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b.bf16.gguf
put it back together:
cat shisa-v1-llama3-70b.bf16.gguf*.part > shisa-v1-llama3-70b.bf16.gguf
ensure the order
cat $(ls -v shisa-v1-llama3-70b.bf16.gguf*.part) > shisa-v1-llama3-70b.bf16.gguf
Conversion script: https://github.com/shisa-ai/shisa-v2/blob/main/convert/gguf.sh