DavidAU commited on
Commit
a07068f
·
verified ·
1 Parent(s): 67d0430

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -9
README.md CHANGED
@@ -34,15 +34,7 @@ tags:
34
  pipeline_tag: text-generation
35
  ---
36
 
37
- <B>L3-Dark-Planet-8B-GGUF - Updates Dec 21 2024: (refreshed, upgraded and new quants):</B>
38
- - All quants have been "refreshed", quanted with the lastest LLAMACPP improvements : Better instruction following, output generation across all quants.
39
- - All quants have also been upgraded with "more bits" for output tensor (all set at Q8_0) and embed for better performance (this is in addition to the "refresh")
40
- - New specialized quants (in addition to the new refresh/upgrades): "max, max-cpu" (will include this in the file name) for quants "Q2K" (max cpu only), "IQ4_XS", "Q6_K" and "Q8_0"
41
- - "MAX": output tensor / embed at float 16. You get better instruction following/output generation than standard/upgraded quants.
42
- - "MAX-CPU": output tensor / embed at bfloat 16, which forces both of these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too.
43
- - "MAX-CPU": Example 1: q8_0 Max-CPU : 2004 mb will load on to CPU/RAM, 7073 mb will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation.
44
- - "MAX-CPU": Example 2: q2_k Max-CPU : 2004 mb will load on to CPU/RAM, 2449 mb will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation. You could run this model/quant on a 4GB vram card.
45
- - Q8_0 (Max,Max-CPU) now clocks in at 9.5 bits per weight (average).
46
 
47
  <h2>L3-Dark-Planet-8B-GGUF</h2>
48
 
@@ -82,6 +74,17 @@ dictate they have their own repos.
82
  The Imatrix versions of this model have even lower perplexity (1/2 level of magnitude lower than this model, 1 full level of magnitude
83
  lower than LLama3 Instruct) then both this model and Llama3 Instruct and enhanced output.
84
 
 
 
 
 
 
 
 
 
 
 
 
85
  <B>Dark Planet Versions:</B>
86
 
87
  The newest Dark Planet 8B SpinFire, now with Llama 3.1 and uncensored:
 
34
  pipeline_tag: text-generation
35
  ---
36
 
37
+ <B>UPDATED - Dec 21 2024: Quants Refreshed, Upgraded and New ones too (better performance - all quants) - see below...</B>
 
 
 
 
 
 
 
 
38
 
39
  <h2>L3-Dark-Planet-8B-GGUF</h2>
40
 
 
74
  The Imatrix versions of this model have even lower perplexity (1/2 level of magnitude lower than this model, 1 full level of magnitude
75
  lower than LLama3 Instruct) then both this model and Llama3 Instruct and enhanced output.
76
 
77
+ <B>QUANT Updates Dec 21 2024: Refreshed, Upgraded and Nw quants:</B>
78
+
79
+ - All quants have been "refreshed", quanted with the lastest LLAMACPP improvements : Better instruction following, output generation across all quants.
80
+ - All quants have also been upgraded with "more bits" for output tensor (all set at Q8_0) and embed for better performance (this is in addition to the "refresh")
81
+ - New specialized quants (in addition to the new refresh/upgrades): "max, max-cpu" (will include this in the file name) for quants "Q2K" (max cpu only), "IQ4_XS", "Q6_K" and "Q8_0"
82
+ - "MAX": output tensor / embed at float 16. You get better instruction following/output generation than standard/upgraded quants.
83
+ - "MAX-CPU": output tensor / embed at bfloat 16, which forces both of these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too.
84
+ - "MAX-CPU": Example 1: q8_0 Max-CPU : 2004 mb will load on to CPU/RAM, 7073 mb will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation.
85
+ - "MAX-CPU": Example 2: q2_k Max-CPU : 2004 mb will load on to CPU/RAM, 2449 mb will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation. You could run this model/quant on a 4GB vram card.
86
+ - Q8_0 (Max,Max-CPU) now clocks in at 9.5 bits per weight (average).
87
+
88
  <B>Dark Planet Versions:</B>
89
 
90
  The newest Dark Planet 8B SpinFire, now with Llama 3.1 and uncensored: