Joseph717171 commited on
Commit
b8bcd85
β€’
1 Parent(s): 367f434

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -3,4 +3,4 @@ Custom GGUF quants of arcee-ai’s [Llama-3.1-SuperNova-Lite-8B](https://hugging
3
  Update: For some reason, the model was initially smaller than LLama-3.1-8B-Instruct after quantizing. This has since been rectified: if you want the most intelligent and most capable quantized GGUF version of Llama-3.1-SuperNova-Lite-8.0B, use the OF32.EF32.IQuants.
4
  The original OQ8_0.EF32.IQuants will remain in the repo for those who want to use them. Cheers! 😁
5
 
6
- Addendum: The 0Q8_0.EF32.IQuants are right for the model's size; I was just being naive: I was comparing my OQ8_0.EF32 IQuants of Llama-3.1-SuperNova-Lite-8B to that of my OQ8_0.EF32 IQuants of Hermes-3-Llama-3.1-8B - thinking they were both the same size as my OQ8_0.EF32.IQuants of LLama-3.1-8B-Instruct; they're not: Hereme-3-Llama-3.1-8B is bigger. So, now we have both OQ8_0.EF32.IQuants and OF32.EF32.IQuants, and they're both great quant schemes. The only difference is being, of course, that OF32.EF32.IQuants have even more accuracy at the expense of more vRAM. So, there you have it. I'm a dumbass, but I learned something, and now we have even more quantizations to play with now. Cheers! πŸ˜‚πŸ˜
 
3
  Update: For some reason, the model was initially smaller than LLama-3.1-8B-Instruct after quantizing. This has since been rectified: if you want the most intelligent and most capable quantized GGUF version of Llama-3.1-SuperNova-Lite-8.0B, use the OF32.EF32.IQuants.
4
  The original OQ8_0.EF32.IQuants will remain in the repo for those who want to use them. Cheers! 😁
5
 
6
+ Addendum: The 0Q8_0.EF32.IQuants are right for the model's size; I was just being naive: I was comparing my OQ8_0.EF32 IQuants of Llama-3.1-SuperNova-Lite-8B to that of my OQ8_0.EF32 IQuants of Hermes-3-Llama-3.1-8B - thinking they were both the same size as my OQ8_0.EF32.IQuants of LLama-3.1-8B-Instruct; they're not: Hereme-3-Llama-3.1-8B is bigger. So, now we have both OQ8_0.EF32.IQuants and OF32.EF32.IQuants, and they're both great quant schemes. The only difference is being, of course, that OF32.EF32.IQuants have even more accuracy at the expense of more vRAM. So, there you have it. I'm a dumbass, but its okay because I learned something, and now we have even more quantizations to play with now. Cheers! πŸ˜‚πŸ˜