Promising vitrine model for sub-100b frankenmerges of 70b models!

#2
by Nexesenex - opened

I tested this 95b version this night, and I'm very pleased.

  • I usually dump the 100b+ models after benching and one chat, because they are under-performing to my taste at 2.5bpw or less compared to their huge requirements (I have "only" 36GB VRAM, not 48). 95b is for me a sweeter spot for a decent quantization. It got me curious and interested to make a GGUF quant strategy just to test it in its prime for my rig (custom "IQ2_LR" 2.7bpw GGUF quant in my case for 8k context), and I used it before benching it for once. I'm uploading my IQ2_LR quant right now.

https://huggingface.co/Nexesenex/LaDameBlanche-v2-95b-iMat-CQ.GGUF

  • And what a treat ! It's very good even below 3bpw, it's not dumb at all, it's very creative and it rivals QuartetAnemoi with a more colorful touch. I need to test more, but I'm optimistic!
  • The benchs are solid at 2.7bpw. Arc challenge 57, Arc Easy 77, Perplexity 512 at 4.5860, those core results (for me) are quite "Miqu vanilla" (even with a slight perplexity bump of something like 0.4-0.5 ppl which can also be imputed on the models, at least on Midnight Miqu 1.5) and that's progress compared to what I saw before with bigger Miqu Frankenmerges. The recipe you used works with benches also, even without making the whole stack of LlamaCPP benchs.

In conclusion, congratulations and thank you! I'm enthusiastic about such sub-100b models for the 70b frankenmerges. I saw that the "mergers" community is very active into trying to find breakthroughs, and, as the others said, the technique you used might be one step forward, as TeeZee's Kyllene 34b was (in my opinion) a while ago for example with a successful use of MergeMonster.

Very cool. Link added to the model card.

I had almost lost hope! It's been such a while since anything interesting happened in the 70+b GGUF space.

Sign up or log in to comment