eryk-mazus
commited on
Commit
•
a3ec953
1
Parent(s):
e2d9808
Update README.md
Browse files
README.md
CHANGED
@@ -22,6 +22,8 @@ prompt_template: '<|im_start|>system
|
|
22 |
|
23 |
*I've copy-pased some information from TheBloke's model cards, hope it's ok*
|
24 |
|
|
|
|
|
25 |
## Prompt template: ChatML
|
26 |
|
27 |
```
|
@@ -44,7 +46,3 @@ Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don
|
|
44 |
Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
|
45 |
|
46 |
If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
|
47 |
-
|
48 |
-
## Notes on performance
|
49 |
-
|
50 |
-
For a model of this size, with stronger quantization, quality appears to decline much more than for larger models. Personally, I would advise to stick with `fp16` or `int8` for this model.
|
|
|
22 |
|
23 |
*I've copy-pased some information from TheBloke's model cards, hope it's ok*
|
24 |
|
25 |
+
For a model of this size, with stronger quantization, quality appears to decline much more than for larger models. Personally, I would advise to stick with `fp16` or `int8` for this model.
|
26 |
+
|
27 |
## Prompt template: ChatML
|
28 |
|
29 |
```
|
|
|
46 |
Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
|
47 |
|
48 |
If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
|
|
|
|
|
|
|
|