shisa-ai
/

shisa-v1-llama3-70b-gguf

Inference Endpoints

Model card Files Files and versions Community

shisa-v1-llama3-70b-gguf / README.md

leonardlin's picture

Update README.md

6366671 verified 7 months ago

|

1.52 kB

	See https://huggingface.co/shisa-ai/shisa-v1-llama3-70b for the original model.

	I was seeing corruption issues at extended context length but this appears to be due to how llama.cpp's `server` behavior defaulting to a small context window.

	See: https://github.com/ggerganov/llama.cpp/issues/7609

	When using the `server`, you should explicitly set `--ctx-size 0` or `--ctx-size 8192` to support the native context size, eg:
	```
	./server -ngl 99 -m shisa-v1-llama3-70b.Q4_K_M.gguf --host 0.0.0.0 --port 8080 --chat-template llama3 --ctx-size 0
	```

	\| Model \| Average \| ELYZA-tasks-100 \| MT-Bench \| Rakuda \| Tengu-Bench \|
	\|----------------------------------------\|---------\|-----------------\|----------\|--------\|-------------\|
	\| shisa-ai/shisa-v1-llama3-70b \| 7.30\| 7.34 \| 7.67 \| 8.15 \| 6.04 \|
	\| shisa-ai/shisa-v1-llama3-70b.Q4_K_M\| 7.22\| 7.22 \| 7.27 \| 8.20 \| 6.19 \|


	For additional quants, including lower-bit iMatrix quants, see: https://huggingface.co/mradermacher/shisa-v1-llama3-70b-GGUF


	split big files:
	```
	split -b 40G -d --additional-suffix=.part shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b.bf16.gguf
	```

	put it back together:
	```
	cat shisa-v1-llama3-70b.bf16.gguf*.part > shisa-v1-llama3-70b.bf16.gguf
	```

	ensure the order
	```
	cat $(ls -v shisa-v1-llama3-70b.bf16.gguf*.part) > shisa-v1-llama3-70b.bf16.gguf
	```

	Conversion script: https://github.com/shisa-ai/shisa-v2/blob/main/convert/gguf.sh