File size: 1,518 Bytes
6366671
e2d9d90
6366671
0b2df40
6366671
57f36e6
6366671
 
 
 
7e34145
 
 
 
 
 
 
6366671
7e34145
cbf5b7b
6366671
cbf5b7b
 
 
 
 
 
 
 
 
6366671
cbf5b7b
 
 
 
6366671
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
See https://huggingface.co/shisa-ai/shisa-v1-llama3-70b for the original model.

I was seeing corruption issues at extended context length but this appears to be due to how llama.cpp's `server` behavior defaulting to a small context window.

See: https://github.com/ggerganov/llama.cpp/issues/7609

When using the `server`, you should explicitly set `--ctx-size 0` or `--ctx-size 8192` to support the native context size, eg:
```
./server -ngl 99 -m shisa-v1-llama3-70b.Q4_K_M.gguf --host 0.0.0.0 --port 8080 --chat-template llama3 --ctx-size 0
```

| Model                                  | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |
|----------------------------------------|---------|-----------------|----------|--------|-------------|
| **shisa-ai/shisa-v1-llama3-70b**       | **7.30**| **7.34**        | **7.67** | **8.15** | **6.04**  |
| **shisa-ai/shisa-v1-llama3-70b.Q4_K_M**| **7.22**| **7.22**        | **7.27** | **8.20** | **6.19**  |


For additional quants, including lower-bit iMatrix quants, see: https://huggingface.co/mradermacher/shisa-v1-llama3-70b-GGUF


split big files:
```
split -b 40G -d --additional-suffix=.part shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b.bf16.gguf
```

put it back together:
```
cat shisa-v1-llama3-70b.bf16.gguf*.part > shisa-v1-llama3-70b.bf16.gguf
```

ensure the order
```
cat $(ls -v shisa-v1-llama3-70b.bf16.gguf*.part) > shisa-v1-llama3-70b.bf16.gguf
```

Conversion script: https://github.com/shisa-ai/shisa-v2/blob/main/convert/gguf.sh