Thanks

#1
by altomek - opened

Thank you for making those quants.

BTW, impresive quants collection!

altomek changed discussion status to closed

It's the one thing I can do without expensive hardware, so it's good to hear that it's of use :)

Your quants fail, or are replaced by different model!

Something is seriously broken with this model when in GGUF format. I am sorry, I do not think it is your fault. I just made my quants and they beahave simillarly. But they do not represent full model potential as do ExLlamav2 quants!

Interesting, normally the exllama quants are noticably worse. When you say, seriously broken, do you mean wrong/corrupted tokens or something like that, or just bad output quality? The former might be a bad tokenizer, which is all to common for models, and possibly might be worked around.

My model is such a rough experiment that you may be right and tokenizer is broken. It loses all its empathy when used with kobalbccp or text-generation-webui in GGUF format. Difference in feel how it responds is so drastic for me that I thought it is not my model. CodeRosa responds 100% with couriosity in this context but in GGUF format it responds like all other models with thanks, ect.. It is so wired... I have to edicate in tokenizers. Thank you for suggestion! Could you add mention in model card that this model is experimental and for best experience for now it is recomended to use ExLlamav2 quants?

No, that's not a broken tokenizer. Also, this is much more likely to be a difference in sampler settings or something to that extent, and less likely to be a problem with the quants themselves. Or simply pure luck due to the inherent randomness.

Sorry, I overlooked your request to add a notice to the model at first - I have added a notice that people should check out the exllama quants before judging this model.

I appreciate your input greatly! After conducting additional tests, I discovered that certain versions of ExLlamav2 can be problematic also! They do not represent model potential that is some emotional inteligence. It is special experience with this model. When this model "feels" what you are about to and has enought knowladge about topic, it can exceed any other model I ever tried. I it is realitivly easy to test with this model when you set temp of 0.1 top_k 0 top_p 0 with context as previously discussed. I just want it to be understood by users and now they will be able to test themselfs ;P. Camming back to GUFF quants, they sometimes works, but generally I hardly can recomend them for now knowing what users will miss in experiance. Maybe it is lack of something like flash attention I guess...? Still testing GGUF quants.

Well, GGUF can also represent the source without quantisation (which I could upload), but of course, practically nobody can run those :) I'll change the text to reflect your findings.

Mayby I should add to use simple prompts. I usually use "You're in chat with {{user}}." and some simple char desctiption, no prompt hacking magic. Model may get confused by too many instructions and how it should be and beahave. So my test works with simple prompts.

I can add a recommended prompt format to the description here as well, if you wish.

You did a lot for me already. I think who need this information will find it. I will also add it to model README. Thank you!

The model README is indeed the best place. Cheers!

Sign up or log in to comment