leonardlin commited on
Commit
6366671
1 Parent(s): 07a24b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -10
README.md CHANGED
@@ -1,10 +1,13 @@
1
- **NOTE: DO NOT USE THESE QUANTS, suffers from corruption issues!**
2
 
3
- (this repo is only public atm for debugging purposes)
4
 
5
- See https://huggingface.co/shisa-ai/shisa-v1-llama3-70b for the working model
6
 
7
- First turn seems to work well (eg, benchmarks fine) but after about turn three, the model starts to output random tokens...
 
 
 
8
 
9
  | Model | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |
10
  |----------------------------------------|---------|-----------------|----------|--------|-------------|
@@ -12,11 +15,10 @@ First turn seems to work well (eg, benchmarks fine) but after about turn three,
12
  | **shisa-ai/shisa-v1-llama3-70b.Q4_K_M**| **7.22**| **7.22** | **7.27** | **8.20** | **6.19** |
13
 
14
 
15
- ---
16
 
17
- Quick and dirty GGUF quants. Maybe some iMatrix soon. BF16 conversion included in this repo.
18
 
19
- split:
20
  ```
21
  split -b 40G -d --additional-suffix=.part shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b.bf16.gguf
22
  ```
@@ -26,10 +28,9 @@ put it back together:
26
  cat shisa-v1-llama3-70b.bf16.gguf*.part > shisa-v1-llama3-70b.bf16.gguf
27
  ```
28
 
29
- insure order
30
  ```
31
  cat $(ls -v shisa-v1-llama3-70b.bf16.gguf*.part) > shisa-v1-llama3-70b.bf16.gguf
32
  ```
33
 
34
-
35
-
 
1
+ See https://huggingface.co/shisa-ai/shisa-v1-llama3-70b for the original model.
2
 
3
+ I was seeing corruption issues at extended context length but this appears to be due to how llama.cpp's `server` behavior defaulting to a small context window.
4
 
5
+ See: https://github.com/ggerganov/llama.cpp/issues/7609
6
 
7
+ When using the `server`, you should explicitly set `--ctx-size 0` or `--ctx-size 8192` to support the native context size, eg:
8
+ ```
9
+ ./server -ngl 99 -m shisa-v1-llama3-70b.Q4_K_M.gguf --host 0.0.0.0 --port 8080 --chat-template llama3 --ctx-size 0
10
+ ```
11
 
12
  | Model | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |
13
  |----------------------------------------|---------|-----------------|----------|--------|-------------|
 
15
  | **shisa-ai/shisa-v1-llama3-70b.Q4_K_M**| **7.22**| **7.22** | **7.27** | **8.20** | **6.19** |
16
 
17
 
18
+ For additional quants, including lower-bit iMatrix quants, see: https://huggingface.co/mradermacher/shisa-v1-llama3-70b-GGUF
19
 
 
20
 
21
+ split big files:
22
  ```
23
  split -b 40G -d --additional-suffix=.part shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b.bf16.gguf
24
  ```
 
28
  cat shisa-v1-llama3-70b.bf16.gguf*.part > shisa-v1-llama3-70b.bf16.gguf
29
  ```
30
 
31
+ ensure the order
32
  ```
33
  cat $(ls -v shisa-v1-llama3-70b.bf16.gguf*.part) > shisa-v1-llama3-70b.bf16.gguf
34
  ```
35
 
36
+ Conversion script: https://github.com/shisa-ai/shisa-v2/blob/main/convert/gguf.sh