frankenmerge
Have you tried or considered a frankenmerge with some of the merged models to create a model with higher parameter count than the 34B?
It seems to be fairly effective with some models.
Just asking out of curiosity and because this is really a favorite model of mine would love to see a 70-120B model with the quality of this one.
Perhaps... The problem is I don't quite know what I'd use to partner with this, since people seem to prefer frankenmerges when they are slammed together with a different model entirely. Stew v1 is too similar in structure to consider, and Brynhildr's results were meh I found. Could try Bruce's old RPMerge, but I might go with a different merging method with the models I used for stew v2, or do a side grade with Pallas and Luminex instead of Tess and Bagel. idk. I'll see if I can cook something up this weekend.
Never mind. Frankenmerge seems busted on my end, as i can't get the model to load as is, or bring it down in size with exl2. You'd be better off using the higher exl2s from Dracones (seems he used my parquet too, which should keep it stable) if you want the model recipe of mine to be smarter for now, or just add some extra context to it's memory length.
Perhaps... The problem is I don't quite know what I'd use to partner with this, since people seem to prefer frankenmerges when they are slammed together with a different model entirely. Stew v1 is too similar in structure to consider, and Brynhildr's results were meh I found. Could try Bruce's old RPMerge, but I might go with a different merging method with the models I used for stew v2, or do a side grade with Pallas and Luminex instead of Tess and Bagel. idk. I'll see if I can cook something up this weekend.
My thoughts was instead of making the model with dare-ties to do it with passthrough. Although my knowledge of these sort of things are fairly basic i have to admit
My thoughts was instead of making the model with dare-ties to do it with passthrough. Although my knowledge of these sort of things are fairly basic i have to admit
I tried doing 1 half the smart general models, and the other half the RP models, before slamming them together in the frankenmerge. But as stated, problems arose. I was able to quant it down to 3.0 afterwards once I updated exl2, but the model was then producing gibberish once loaded. So it's still a no-go from me.
ah oke, thanks for clarifying and trying ofcourse
Someone else did end up making one, but I have no idea if it outputs correctly due to it's size.
https://huggingface.co/Kotokin/Merged-RP-Stew-V2-51B?not-for-all-audiences=true
Hi, I had no problems with the output of both 51B and 68B. Except that 68B sometimes gave out strange grammatical errors.
Hi, I had no problems with the output of both 51B and 68B. Except that 68B sometimes gave out strange grammatical errors.
Great! The one I made didn't seem to work at all, so it's cool to give people the option of a bigger model if they want it.
I have to agree with the 51B, it performed excellent. Have not yet tried the 68B yet.