gobean
/

dolphin-2.9-llama3-8b.llamafile

Generated from Trainer

Model card Files Files and versions Community

gobean commited on Apr 21

Commit

a9bffd9

•

1 Parent(s): 4278800

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -21,8 +21,12 @@ datasets:
 This is the [llamafile](https://github.com/Mozilla-Ocho/llamafile) for [Dolphin 2.9 Llama 3 8b](https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b).
-Quick tests show it's good but not as sharp as the base model, using just some few shot prompts looking for precision when asking about history and science. More tests will have to be done to compare this and WizardLM-7B to see how much the finetuning did to Llama-3-8B.
 size notes:
 Windows users, go for q3-k-m. Others, use the biggest one that works on your machine. FreeBSD users, you're the real heroes.

 This is the [llamafile](https://github.com/Mozilla-Ocho/llamafile) for [Dolphin 2.9 Llama 3 8b](https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b).
+Quick tests show it's good but not as sharp as the base model, using just some few shot prompts looking for precision when asking about history and science. More tests will have to be done to compare this and WizardLM-7B to see how much the finetuning/new EOS did to Llama-3-8B.
+Notably, [cognitivecomputations](https://huggingface.co/cognitivecomputations) uses a single EOS token. This fixes the garbled output bug. Hooray! It may however prevent some intended behavior of Llama3's internal monologue/thoughts that adds to the model's apparent sharpness. Download Meta's original weights and load manually in python to see what it's capable of as a comparison. We're all awaiting any fixes to llama.cpp and/or the base gguf structure. In the meantime this dolphin is a good fix and excellent work.
+conversion notes:
+I converted the original safetensors to f32 to preserve the fidelity from bf16, then quantized ggufs from there. Not sure what most ggufs on hf are doing if they don't say.
 size notes:
 Windows users, go for q3-k-m. Others, use the biggest one that works on your machine. FreeBSD users, you're the real heroes.