Joseph717171 commited on
Commit
ea3562b
Β·
verified Β·
1 Parent(s): 62c2a5f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -3,4 +3,4 @@ Custom GGUF quants of arcee-ai’s [Llama-3.1-SuperNova-Lite-8B](https://hugging
3
  Update: For some reason, the model was initially smaller than LLama-3.1-8B-Instruct after quantizing. This has since been rectified: if you want the most intelligent and most capable quantized GGUF version of Llama-3.1-SuperNova-Lite-8.0B, use the OF32.EF32.IQuants.
4
  The original OQ8_0.EF32.IQuants will remain in the repo for those who want to use them. Cheers! 😁
5
 
6
- Addendum: I'm stupid. I was comparing my OQ8_0.EF32 IQuants of Llama-3.1-SuperNova-Lite-8B to that of my OQ8_0.EF32 IQuants of Hermes-3-Llama-3.1-8B - thinking they were both the same size as my OQ8_0.EF32.IQuants of LLama-3.1-8B-Instruct; they're not: Hereme-3-Llama-3.1-8B is bigger. So, now we have both OQ8_0.EF32.IQuants and OF32.EF32.IQuants, and they're both great quant schemes. The only difference is being, of course, that OF32.EF32.IQuants have even more accuracy at the expense of more vRAM. So, there you have it. I'm a dumbass, but I learned something, and now we have even more quantizations to play with now. Cheers! πŸ˜‚πŸ˜
 
3
  Update: For some reason, the model was initially smaller than LLama-3.1-8B-Instruct after quantizing. This has since been rectified: if you want the most intelligent and most capable quantized GGUF version of Llama-3.1-SuperNova-Lite-8.0B, use the OF32.EF32.IQuants.
4
  The original OQ8_0.EF32.IQuants will remain in the repo for those who want to use them. Cheers! 😁
5
 
6
+ Addendum: The 0Q8_0.EF32.IQuants are right for the model's size; I'm stupid, because I was comparing my OQ8_0.EF32 IQuants of Llama-3.1-SuperNova-Lite-8B to that of my OQ8_0.EF32 IQuants of Hermes-3-Llama-3.1-8B - thinking they were both the same size as my OQ8_0.EF32.IQuants of LLama-3.1-8B-Instruct; they're not: Hereme-3-Llama-3.1-8B is bigger. So, now we have both OQ8_0.EF32.IQuants and OF32.EF32.IQuants, and they're both great quant schemes. The only difference is being, of course, that OF32.EF32.IQuants have even more accuracy at the expense of more vRAM. So, there you have it. I'm a dumbass, but I learned something, and now we have even more quantizations to play with now. Cheers! πŸ˜‚πŸ˜