frankenmerge

by ryzen88 - opened Apr 9

Apr 9

Have you tried or considered a frankenmerge with some of the merged models to create a model with higher parameter count than the 34B?
It seems to be fairly effective with some models.
Just asking out of curiosity and because this is really a favorite model of mine would love to see a 70-120B model with the quality of this one.

ParasiticRogue

Owner Apr 9

Perhaps... The problem is I don't quite know what I'd use to partner with this, since people seem to prefer frankenmerges when they are slammed together with a different model entirely. Stew v1 is too similar in structure to consider, and Brynhildr's results were meh I found. Could try Bruce's old RPMerge, but I might go with a different merging method with the models I used for stew v2, or do a side grade with Pallas and Luminex instead of Tess and Bagel. idk. I'll see if I can cook something up this weekend.

ParasiticRogue

Owner Apr 10

•

edited Apr 10

Never mind. Frankenmerge seems busted on my end, as i can't get the model to load as is, or bring it down in size with exl2. You'd be better off using the higher exl2s from Dracones (seems he used my parquet too, which should keep it stable) if you want the model recipe of mine to be smarter for now, or just add some extra context to it's memory length.

ryzen88

Apr 10

Perhaps... The problem is I don't quite know what I'd use to partner with this, since people seem to prefer frankenmerges when they are slammed together with a different model entirely. Stew v1 is too similar in structure to consider, and Brynhildr's results were meh I found. Could try Bruce's old RPMerge, but I might go with a different merging method with the models I used for stew v2, or do a side grade with Pallas and Luminex instead of Tess and Bagel. idk. I'll see if I can cook something up this weekend.

My thoughts was instead of making the model with dare-ties to do it with passthrough. Although my knowledge of these sort of things are fairly basic i have to admit

ParasiticRogue

Owner Apr 10

My thoughts was instead of making the model with dare-ties to do it with passthrough. Although my knowledge of these sort of things are fairly basic i have to admit

I tried doing 1 half the smart general models, and the other half the RP models, before slamming them together in the frankenmerge. But as stated, problems arose. I was able to quant it down to 3.0 afterwards once I updated exl2, but the model was then producing gibberish once loaded. So it's still a no-go from me.

ryzen88

Apr 11

ah oke, thanks for clarifying and trying ofcourse

ParasiticRogue

Owner Apr 18

Someone else did end up making one, but I have no idea if it outputs correctly due to it's size.

https://huggingface.co/Kotokin/Merged-RP-Stew-V2-51B?not-for-all-audiences=true

Kotokin

Apr 18

Hi, I had no problems with the output of both 51B and 68B. Except that 68B sometimes gave out strange grammatical errors.

ParasiticRogue

Owner Apr 18

•

edited Apr 18

Hi, I had no problems with the output of both 51B and 68B. Except that 68B sometimes gave out strange grammatical errors.

Great! The one I made didn't seem to work at all, so it's cool to give people the option of a bigger model if they want it.

ryzen88

Apr 20

I have to agree with the 51B, it performed excellent. Have not yet tried the 68B yet.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment