leonardlin
commited on
Commit
•
6366671
1
Parent(s):
07a24b6
Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,13 @@
|
|
1 |
-
|
2 |
|
3 |
-
|
4 |
|
5 |
-
See https://
|
6 |
|
7 |
-
|
|
|
|
|
|
|
8 |
|
9 |
| Model | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |
|
10 |
|----------------------------------------|---------|-----------------|----------|--------|-------------|
|
@@ -12,11 +15,10 @@ First turn seems to work well (eg, benchmarks fine) but after about turn three,
|
|
12 |
| **shisa-ai/shisa-v1-llama3-70b.Q4_K_M**| **7.22**| **7.22** | **7.27** | **8.20** | **6.19** |
|
13 |
|
14 |
|
15 |
-
|
16 |
|
17 |
-
Quick and dirty GGUF quants. Maybe some iMatrix soon. BF16 conversion included in this repo.
|
18 |
|
19 |
-
split:
|
20 |
```
|
21 |
split -b 40G -d --additional-suffix=.part shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b.bf16.gguf
|
22 |
```
|
@@ -26,10 +28,9 @@ put it back together:
|
|
26 |
cat shisa-v1-llama3-70b.bf16.gguf*.part > shisa-v1-llama3-70b.bf16.gguf
|
27 |
```
|
28 |
|
29 |
-
|
30 |
```
|
31 |
cat $(ls -v shisa-v1-llama3-70b.bf16.gguf*.part) > shisa-v1-llama3-70b.bf16.gguf
|
32 |
```
|
33 |
|
34 |
-
|
35 |
-
|
|
|
1 |
+
See https://huggingface.co/shisa-ai/shisa-v1-llama3-70b for the original model.
|
2 |
|
3 |
+
I was seeing corruption issues at extended context length but this appears to be due to how llama.cpp's `server` behavior defaulting to a small context window.
|
4 |
|
5 |
+
See: https://github.com/ggerganov/llama.cpp/issues/7609
|
6 |
|
7 |
+
When using the `server`, you should explicitly set `--ctx-size 0` or `--ctx-size 8192` to support the native context size, eg:
|
8 |
+
```
|
9 |
+
./server -ngl 99 -m shisa-v1-llama3-70b.Q4_K_M.gguf --host 0.0.0.0 --port 8080 --chat-template llama3 --ctx-size 0
|
10 |
+
```
|
11 |
|
12 |
| Model | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |
|
13 |
|----------------------------------------|---------|-----------------|----------|--------|-------------|
|
|
|
15 |
| **shisa-ai/shisa-v1-llama3-70b.Q4_K_M**| **7.22**| **7.22** | **7.27** | **8.20** | **6.19** |
|
16 |
|
17 |
|
18 |
+
For additional quants, including lower-bit iMatrix quants, see: https://huggingface.co/mradermacher/shisa-v1-llama3-70b-GGUF
|
19 |
|
|
|
20 |
|
21 |
+
split big files:
|
22 |
```
|
23 |
split -b 40G -d --additional-suffix=.part shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b.bf16.gguf
|
24 |
```
|
|
|
28 |
cat shisa-v1-llama3-70b.bf16.gguf*.part > shisa-v1-llama3-70b.bf16.gguf
|
29 |
```
|
30 |
|
31 |
+
ensure the order
|
32 |
```
|
33 |
cat $(ls -v shisa-v1-llama3-70b.bf16.gguf*.part) > shisa-v1-llama3-70b.bf16.gguf
|
34 |
```
|
35 |
|
36 |
+
Conversion script: https://github.com/shisa-ai/shisa-v2/blob/main/convert/gguf.sh
|
|