Quants with iMatrix for : https://huggingface.co/TeeZee/Kyllene-34B-v1.1 --- TeeZee's Kyllene model is one of the best Yi_34b merge around with those of BruceTheMoose. But it has a little thing which distinguishes it : It uses Gryphe's MergeMonster as a tool to trim out the GPTisms, Yisms, and Llamaisms, and give a more natural output. The clearing of any problematic gptism, llamaism, or yiism which was specified to MergeMonster is noticeable And it's like the model is freed of these sequences which represent some form of "EOS chains of tokens" in many models, this in the sense that they conclude many outputs, this ofc in an unwanted way It's quite a step in the right direction which should become the standard practice. That make me wonder about the future, when we'll get Miqu 70b models properly finetuned with the best datatsets AND with the Mistralisms trimmed out as well. --- Available quants : Full offload possible on 48GB VRAM with a huge context size : Q8_0 Full offload possible on 36 GB VRAM with a huge context size : Q5_K_S Full offload possible on 24GB VRAM with a big to huge context size (from 12288 with Q4_K_M, for example) Q4_K_M, Q4_K_S, Q3_K_M Full offload possible on 16GB VRAM with a decent context size IQ3_XXS SOTA otw (which is equivalent to a Q3_K_S with more context!), Q2_K, Q2_K_S otw Full offload possible on 12GB VRAM with a decent context size. IQ2_XS SOTA otw Lower quality : IQ2_XXS SOTA otw --- The merge parameters and logs are in the repo : https://huggingface.co/TeeZee/Kyllene-34B-v1.1/tree/main