Nexesenex
/

TeeZee_Kyllene-Yi-34B-v1.1-iMat.GGUF

Inference Endpoints

Model card Files Files and versions Community

TeeZee_Kyllene-Yi-34B-v1.1-iMat.GGUF / README.md

Nexesenex's picture

Update README.md

368164c verified 8 months ago

|

No virus

1.59 kB

	Quants with iMatrix for : https://huggingface.co/TeeZee/Kyllene-34B-v1.1

	---

	TeeZee's Kyllene model is one of the best Yi_34b merge around with those of BruceTheMoose.

	But it has a little thing which distinguishes it :

	It uses Gryphe's MergeMonster as a tool to trim out the GPTisms, Yisms, and Llamaisms, and give a more natural output.

	The clearing of any problematic gptism, llamaism, or yiism which was specified to MergeMonster is noticeable
	And it's like the model is freed of these sequences which represent some form of "EOS chains of tokens" in many models, this in the sense that they conclude many outputs, this ofc in an unwanted way
	It's quite a step in the right direction which should become the standard practice.

	That make me wonder about the future, when we'll get Miqu 70b models properly finetuned with the best datatsets AND with the Mistralisms trimmed out as well.

	---

	Available quants :

	Full offload possible on 48GB VRAM with a huge context size :

	Q8_0

	Full offload possible on 36 GB VRAM with a huge context size :

	Q5_K_S

	Full offload possible on 24GB VRAM with a big to huge context size (from 12288 with Q4_K_M, for example)

	Q4_K_M, Q4_K_S, Q3_K_M

	Full offload possible on 16GB VRAM with a decent context size

	IQ3_XXS SOTA otw (which is equivalent to a Q3_K_S with more context!), Q2_K, Q2_K_S otw

	Full offload possible on 12GB VRAM with a decent context size.

	IQ2_XS SOTA otw
	Lower quality : IQ2_XXS SOTA otw

	---

	The merge parameters and logs are in the repo : https://huggingface.co/TeeZee/Kyllene-34B-v1.1/tree/main