Impressive as always. Qwen2.5 32B?

by MRGRD56 - opened Oct 21, 2024

Oct 21, 2024

The model is amazing, imo it's way more coherent than other Mistral Small 22B fine-tunes that I've tried.
Before, I was using your Nemo 12B based models, they were really good (though I haven't tried the new v4 12B one).
Your fine-tunes are really nice, keep it up, thank you 🙂

And would you consider making a Qwen2.5 32B fine tune? I can't really run the 72B version, but a 32B one would be interesting to see.

WesPro

Oct 21, 2024

The model is amazing, imo it's way more coherent than other Mistral Small 22B fine-tunes that I've tried.
Before, I was using your Nemo 12B based models, they were really good (though I haven't tried the new v4 12B one).
Your fine-tunes are really nice, keep it up, thank you 🙂

And would you consider making a Qwen2.5 32B fine tune? I can't really run the 72B version, but a 32B one would be interesting to see.

I already opened a thread where your question is also answered

https://huggingface.co/anthracite-org/magnum-v4-27b-gguf/discussions/1

MRGRD56

Oct 21, 2024

Oh I see, thank you

WesPro

Oct 21, 2024

Oh I see, thank you

You're welcome... I guess we both had the same idea. I really like the new qwen 32b and I wanted to test if I could push it into a refusal but somehow I can't seem to trigger them. I also tried a new finetune on the 32b version (https://huggingface.co/fblgit/TheBeagle-v2beta-32B-MGS) and it actually turned out to be one of the most explicit and morality free models I've ever tried. It even came up with ways to escalate the fictional situation further. Of course that's only my personal experience but I guess there is a way to achieve a qwen32b finetune that is at least as uncensored as the models based on other architectures.

Doctor-Shotgun

Anthracite org Oct 21, 2024

There were at least 5 full runs on qwen 2.5 32b including on both the base and instruct models, but none of them passed internel testing for release.

The results were just... surprisingly not good compared to the same dataset trained on 72b.

UniversalLove333

Oct 23, 2024

What if you trained it on more data to override the model's bias?
And use a different system template?

I find Qwen has a ton of potential, but it's not quite there for eRP and stories.

psych0v0yager

Nov 14, 2024

•

edited Nov 14, 2024

@Doctor-Shotgun I know it just came out, but may qwen 2.5 32b coder behave differently then the regular qwen 2.5 32b? Maybe the extra code tokens may have shifted the internal distributions a bit to make it finetune better

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment