eryk-mazus
/

polka-1.1b-chat-gguf

Text Generation

text-generation-inference

Model card Files Files and versions Community

eryk-mazus commited on Feb 12

Commit

e2d9808

•

1 Parent(s): e9f2763

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -43,4 +43,8 @@ Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don
 Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
-If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`

 Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
+If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
+## Notes on performance
+For a model of this size, with stronger quantization, quality appears to decline much more than for larger models. Personally, I would advise to stick with `fp16` or `int8` for this model.