shisa-ai
/

shisa-v1-llama3-70b-gguf

Inference Endpoints

Model card Files Files and versions Community

leonardlin commited on May 30, 2024

Commit

6366671

·

verified ·

1 Parent(s): 07a24b6

Update README.md

Files changed (1) hide show

README.md +11 -10

README.md CHANGED Viewed

@@ -1,10 +1,13 @@
-**NOTE: DO NOT USE THESE QUANTS, suffers from corruption issues!**
-(this repo is only public atm for debugging purposes)
-See https://huggingface.co/shisa-ai/shisa-v1-llama3-70b for the working model
-First turn seems to work well (eg, benchmarks fine) but after about turn three, the model starts to output random tokens...
 | Model                                  | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |
 |----------------------------------------|---------|-----------------|----------|--------|-------------|
@@ -12,11 +15,10 @@ First turn seems to work well (eg, benchmarks fine) but after about turn three,
 | **shisa-ai/shisa-v1-llama3-70b.Q4_K_M**| **7.22**| **7.22**        | **7.27** | **8.20** | **6.19**  |
----
-Quick and dirty GGUF quants. Maybe some iMatrix soon. BF16 conversion included in this repo.
-split:
 ```
 split -b 40G -d --additional-suffix=.part shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b.bf16.gguf
 ```
@@ -26,10 +28,9 @@ put it back together:
 cat shisa-v1-llama3-70b.bf16.gguf*.part > shisa-v1-llama3-70b.bf16.gguf
 ```
-insure order
 ```
 cat $(ls -v shisa-v1-llama3-70b.bf16.gguf*.part) > shisa-v1-llama3-70b.bf16.gguf
 ```

+See https://huggingface.co/shisa-ai/shisa-v1-llama3-70b for the original model.
+I was seeing corruption issues at extended context length but this appears to be due to how llama.cpp's `server` behavior defaulting to a small context window.
+See: https://github.com/ggerganov/llama.cpp/issues/7609
+When using the `server`, you should explicitly set `--ctx-size 0` or `--ctx-size 8192` to support the native context size, eg:
+```
+./server -ngl 99 -m shisa-v1-llama3-70b.Q4_K_M.gguf --host 0.0.0.0 --port 8080 --chat-template llama3 --ctx-size 0
+```
 | Model                                  | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |
 |----------------------------------------|---------|-----------------|----------|--------|-------------|
 | **shisa-ai/shisa-v1-llama3-70b.Q4_K_M**| **7.22**| **7.22**        | **7.27** | **8.20** | **6.19**  |
+For additional quants, including lower-bit iMatrix quants, see: https://huggingface.co/mradermacher/shisa-v1-llama3-70b-GGUF
+split big files:
 ```
 split -b 40G -d --additional-suffix=.part shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b.bf16.gguf
 ```
 cat shisa-v1-llama3-70b.bf16.gguf*.part > shisa-v1-llama3-70b.bf16.gguf
 ```
+ensure the order
 ```
 cat $(ls -v shisa-v1-llama3-70b.bf16.gguf*.part) > shisa-v1-llama3-70b.bf16.gguf
 ```
+Conversion script: https://github.com/shisa-ai/shisa-v2/blob/main/convert/gguf.sh