BramVanroy
/

fietje-2-instruct-gguf

Inference Endpoints

Model card Files Files and versions Community

BramVanroy commited on May 1, 2024

Commit

ff31f9e

·

verified ·

1 Parent(s): f75574c

Update README.md

Files changed (1) hide show

README.md +13 -10

README.md CHANGED Viewed

@@ -8,16 +8,19 @@ tags:
 This repository contains quantized versions of [BramVanroy/fietje-2b-instruct](https://huggingface.co/BramVanroy/fietje-2b-instruct):
-- `-f16` (5.6GB): best quality, but largest and slowest (recommended if you have the capacity, otherwise q8_0)
-- `-q8_0` (3.0GB): minimal quality loss, smaller
-- `-q5_k_m` (2.0GB): users have reported considerable quality loss in the chat `q5_k_m` version so you may want to avoid it
-Also available on ollama:
-```sh
-# defaults to f16
-ollama run bramvanroy/fietje-2b-instruct
-ollama run bramvanroy/fietje-2b-instruct:f16
-ollama run bramvanroy/fietje-2b-instruct:q8_0
-ollama run bramvanroy/fietje-2b-instruct:q5_k_m
 ```

 This repository contains quantized versions of [BramVanroy/fietje-2b-instruct](https://huggingface.co/BramVanroy/fietje-2b-instruct):
+Available quantization types and expected performance differences compared to base `f16`, higher perplexity=worse (from llama.cpp):
+```
+Q3_K_M  :  3.07G, +0.2496 ppl @ LLaMA-v1-7B
+Q4_K_M  :  3.80G, +0.0532 ppl @ LLaMA-v1-7B
+Q5_K_M  :  4.45G, +0.0122 ppl @ LLaMA-v1-7B
+Q6_K    :  5.15G, +0.0008 ppl @ LLaMA-v1-7B
+Q8_0    :  6.70G, +0.0004 ppl @ LLaMA-v1-7B
+F16     : 13.00G              @ 7B
+```
+Also available on [ollama](https://ollama.com/bramvanroy/fietje-2b-instruct).
+Quants were made with release [`b2777`](https://github.com/ggerganov/llama.cpp/releases/tag/b2777) of llama.cpp.
 ```