DavidAU
/

Psyonic-Cetacean-Ultra-Quality-20b-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Psyonic-Cetacean-Ultra-Quality-20b-GGUF / README.md

DavidAU's picture

Update README.md

e5ea2b6 verified 6 months ago

|

2.22 kB

	---
	license: apache-2.0
	---
	<font color=red><h3> Ultra High Remaster of the incredible: Psyonic-Cetacean-20b. </h3></font>

	This is a Floating Point 32 upscale, where all components and merges were remastered to floating point 32.
	This includes all the merges (recreated with master files), and where possible subbing full FP32 models.

	The goal: Carry forward maximum precision right up to the point where it is "GUFFed".

	This includes F32 master file for GGUF too... at a whopping 78 GBs.

	Why?

	Because the difference between F32 vs BF16 is... 8 DECIMAL places.

	And as each merge / model is modified there are "losses" along the way.

	These losses are carried forward and in turn lead to more losses.

	Small?

	Yes... but multipled by each merge(s), and compression(s): 20 billion times.

	<B>The result:</b>

	At Q2K an impressive drop of 533 points in perplexity. (lower is better)
	(VS: Q2K orginal base model: PPL = 9.8077 +/- 0.06821 )

	At Q4KM an incredible drop of 976 points in perplexity.
	(VS: Q4km orginal base model -> PPL = 8.7858 +/- 0.06074)

	At Q6 an awesome drop of 234 points in perplexity.
	(VS: Q6 orginal base model -> PPL = 8.6070 +/- 0.05907 )

	To put this in perspective "Q6" now operates ABOVE the orginal full precision version of "Psyonic-Cetacean-20b".

	This because at "Q6" the quant / compressed model is considered to be accurate within "+0.0008 ppl" of the full, uncompressed / unquanted model and it exceeds this by over 200 points.

	<B>The bottom line here is:</b>

	Higher quality instruction following and output.

	Likewise you can use a smaller compression, with higher token per second and still get great quality.

	Same great model... turbo charged.

	This is the first group of remasters.

	<B>More Coming soon...</B>

	It will be followed by a "reg quant plus", which added additional components into the GGUF (all) at floating point 32
	precision to further increases the sheer creatity and raw AI horsepower.

	Following this group will be a full precision Imatrix, and Imatrix Plus repo that will push the limit even more.

	Details of all methods employed to make this high precision remasters will be posted shortly along with comparsions of orginal model and new ultra remaster.