Unintentionally more stable than v4.
Pleasantly surprised with this one so far, seems to have decent results so far on my end.
@Lewdiculous
I left the image the same its basically v4 with with mistral 0.2's config and weights between two sub merges combined into this one. Should have close to 32k w/o swa if everything went correctly.
It's all happy accidents
0.2 bringing good luck
Long context and vision?
The audacity!
@Nitral-AI It's quanting, unrelated but lemme sell you on something cool:
https://github.com/ajeetdsouza/zoxide
It's addicting and I can never use regular cd
anymore after this. I recommend you full on replace your cd
with it.
@Nitral-AI – You have done it. Graced by the glory of Mistral 0.2, not even I am complaining about V4 (yet? KEKW). Congratulations!
As per feedback from LocalBasedMan, author of Erosumika, it seems the 0.2 base really benefits from very tame, low temperature sampling settings, to be the most stable while still not feeling particularly repetitive. I have uploaded presets I consider "good starting points" here:
https://huggingface.co/Lewdiculous/Model-Requests/tree/main/data/presets/lewdicu-3.0.2-mistral-0.2
For Eris V4: Formatting is good (Cards: Chiaki (multiple characters speak/act per response) - 170 tokens/response, Mesugaki Correction School (RPG style status information in responses) - 350 tokens/response), writing seems good, vision is good. Haven't tested intelligence particularly but she is adhering to the appropriate characters so I'll take that as a positive indication.
Appreciate the feedback my dude, this one took way longer to pull off than i wanted, but im glad it worked out. Thank you as always for the quants as well, regarding intelligence its kind of hard for me to say. Its definitely not as smart as 3.05 raw, but it doesn't feel hugely off from 3.075.
@Nitral-AI I'm thinking about adding some explicit NSFW chats to the imatrix calibration data, some entries from the recent RP-NSFW-test database from the fellow Chaotics, I believe it's on Replacement. But anyways, do you have a way to eval quant quality, I was thinking about directly comparing 2 different Q4_K_M-imat quants of V4-32K – Or maybe the IQ3_M instead to look for a more dramatic difference.
Would you be able to measure things like KL divergence? Even though it's not a decisive score, but yeah... I just don't have this set up, so would like to see if you do.
You could run perplexity from within llama.cpp on the quants, although i don't think that is the end all be all test generally speaking, nor in this case . @Lewdiculous
And no i dont have anything setup for gguf quants to test quant quality. I typically use the per layer accuracy score to f16 from the quant (you can see this when making exl2 quants) and average it over the 32 hidden layers. But even that isnt a perfect method, since it only tells you how accurate to f16 the quantl is as the method of comparison.
PPL will do. Unless something looks very wrong.