DavidAU
/

L3-Dark-Planet-8B-GGUF

Model card Files Files and versions Community

DavidAU commited on about 1 month ago

Commit

a07068f

verified ·

1 Parent(s): 67d0430

Update README.md

Browse files

Files changed (1) hide show

README.md +12 -9

README.md CHANGED Viewed

@@ -34,15 +34,7 @@ tags:
 pipeline_tag: text-generation
 ---
-<B>L3-Dark-Planet-8B-GGUF - Updates Dec 21 2024: (refreshed, upgraded and new quants):</B>
-- All quants have been "refreshed", quanted with the lastest LLAMACPP improvements : Better instruction following, output generation across all quants.
-- All quants have also been upgraded with "more bits" for output tensor (all set at Q8_0) and embed for better performance (this is in addition to the "refresh")
-- New specialized quants (in addition to the new refresh/upgrades): "max, max-cpu" (will include this in the file name) for quants "Q2K" (max cpu only), "IQ4_XS", "Q6_K" and "Q8_0"
-- "MAX": output tensor / embed at float 16. You get better instruction following/output generation than standard/upgraded quants.
-- "MAX-CPU": output tensor / embed at bfloat 16, which forces both of these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too.
-- "MAX-CPU": Example 1: q8_0 Max-CPU : 2004 mb will load on to CPU/RAM, 7073 mb will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation.
-- "MAX-CPU": Example 2: q2_k Max-CPU : 2004 mb will load on to CPU/RAM, 2449 mb will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation. You could run this model/quant on a 4GB vram card.
-- Q8_0 (Max,Max-CPU) now clocks in at 9.5 bits per weight (average).
 <h2>L3-Dark-Planet-8B-GGUF</h2>
@@ -82,6 +74,17 @@ dictate they have their own repos.
 The Imatrix versions of this model have even lower perplexity (1/2 level of magnitude lower than this model, 1 full level of magnitude
 lower than LLama3 Instruct) then both this model and Llama3 Instruct and enhanced output.
 <B>Dark Planet Versions:</B>
 The newest Dark Planet 8B SpinFire, now with Llama 3.1 and uncensored:

 pipeline_tag: text-generation
 ---
+<B>UPDATED - Dec 21 2024: Quants Refreshed, Upgraded and New ones too (better performance - all quants) - see below...</B>
 <h2>L3-Dark-Planet-8B-GGUF</h2>
 The Imatrix versions of this model have even lower perplexity (1/2 level of magnitude lower than this model, 1 full level of magnitude
 lower than LLama3 Instruct) then both this model and Llama3 Instruct and enhanced output.
+<B>QUANT Updates Dec 21 2024: Refreshed, Upgraded and Nw quants:</B>
+- All quants have been "refreshed", quanted with the lastest LLAMACPP improvements : Better instruction following, output generation across all quants.
+- All quants have also been upgraded with "more bits" for output tensor (all set at Q8_0) and embed for better performance (this is in addition to the "refresh")
+- New specialized quants (in addition to the new refresh/upgrades): "max, max-cpu" (will include this in the file name) for quants "Q2K" (max cpu only), "IQ4_XS", "Q6_K" and "Q8_0"
+- "MAX": output tensor / embed at float 16. You get better instruction following/output generation than standard/upgraded quants.
+- "MAX-CPU": output tensor / embed at bfloat 16, which forces both of these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too.
+- "MAX-CPU": Example 1: q8_0 Max-CPU : 2004 mb will load on to CPU/RAM, 7073 mb will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation.
+- "MAX-CPU": Example 2: q2_k Max-CPU : 2004 mb will load on to CPU/RAM, 2449 mb will load onto the GPU/vram. Extra Vram can be used for context. NOTE: "Math" on the CPU is slightly more accurate than GPU, so you may get a better generation. You could run this model/quant on a 4GB vram card.
+- Q8_0 (Max,Max-CPU) now clocks in at 9.5 bits per weight (average).
 <B>Dark Planet Versions:</B>
 The newest Dark Planet 8B SpinFire, now with Llama 3.1 and uncensored: