qwp4w3hyb commited on
Commit
1b339fc
1 Parent(s): 8c2dea2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -15
README.md CHANGED
@@ -17,22 +17,11 @@ license: other
17
  license_name: llama3
18
  license_link: LICENSE
19
  ---
20
-
21
- ## Note about eos token
22
- It seems llama 3 uses a different eos tokens depending if it is in instruct mode.
23
- The initial upload had some issues with this as it uses the "default" eos token of 128001, but when in instruct mode llama only outputs 128009 as eos token which causes it to ramble on and on without stopping.
24
-
25
- I have uploaded fixed quants with the eos token id manually set to 128009.
26
-
27
- This fixes the issue for me, but you have to make sure to use the correct chat template, ~I recommend using [this](https://github.com/ggerganov/llama.cpp/pull/6751) PR~(it has been merged just used the newest llama.cpp master) and then launching llama.cpp with `--chat-template llama3`.
28
-
29
- If you do not want to redownload you can fix your local gguf file with this command:
30
- ```
31
- python3 ./path-to-llama.cpp/gguf-py/scripts/gguf-set-metadata.py $file tokenizer.ggml.eos_token_id 128009 --force
32
- ```
33
-
34
  # Quant Infos
35
 
 
 
 
36
 
37
  Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [0d56246f4b9764158525d894b96606f6163c53a8](https://github.com/ggerganov/llama.cpp/commit/0d56246f4b9764158525d894b96606f6163c53a8) (master from 2024-04-18)
38
 
@@ -47,7 +36,22 @@ Using this command to generate the importance matrix from the f16.gguf with [thi
47
  ./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat
48
  ```
49
 
50
- # Original Readme:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  ## Model Details
53
 
 
17
  license_name: llama3
18
  license_link: LICENSE
19
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  # Quant Infos
21
 
22
+ - quants done with an importance matrix for improved quantization loss
23
+ - K & IQ quants in basically all variants from Q6_K down to IQ_S
24
+ - fixed end token for instruct mode (<|eot_id|>[128009])
25
 
26
  Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [0d56246f4b9764158525d894b96606f6163c53a8](https://github.com/ggerganov/llama.cpp/commit/0d56246f4b9764158525d894b96606f6163c53a8) (master from 2024-04-18)
27
 
 
36
  ./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat
37
  ```
38
 
39
+ ## Note about eos token
40
+ It seems llama 3 uses a different eos tokens depending if it is in instruct mode.
41
+ The initial upload had some issues with this as it uses the "default" eos token of 128001, but when in instruct mode llama only outputs 128009 as eos token which causes it to ramble on and on without stopping.
42
+
43
+ I have uploaded fixed quants with the eos token id manually set to 128009.
44
+
45
+ This fixes the issue for me, but you have to make sure to use the correct chat template, ~I recommend using [this](https://github.com/ggerganov/llama.cpp/pull/6751) PR~(it has been merged just used the newest llama.cpp master) and then launching llama.cpp with `--chat-template llama3`.
46
+
47
+ If you do not want to redownload you can fix your local gguf file with this command:
48
+ ```
49
+ python3 ./path-to-llama.cpp/gguf-py/scripts/gguf-set-metadata.py $file tokenizer.ggml.eos_token_id 128009 --force
50
+ ```
51
+
52
+ ------------------------
53
+
54
+ # Original Model Card:
55
 
56
  ## Model Details
57