Impressive as always. Qwen2.5 32B?

#1
by MRGRD56 - opened

The model is amazing, imo it's way more coherent than other Mistral Small 22B fine-tunes that I've tried.
Before, I was using your Nemo 12B based models, they were really good (though I haven't tried the new v4 12B one).
Your fine-tunes are really nice, keep it up, thank you πŸ™‚

And would you consider making a Qwen2.5 32B fine tune? I can't really run the 72B version, but a 32B one would be interesting to see.

The model is amazing, imo it's way more coherent than other Mistral Small 22B fine-tunes that I've tried.
Before, I was using your Nemo 12B based models, they were really good (though I haven't tried the new v4 12B one).
Your fine-tunes are really nice, keep it up, thank you πŸ™‚

And would you consider making a Qwen2.5 32B fine tune? I can't really run the 72B version, but a 32B one would be interesting to see.

I already opened a thread where your question is also answered

https://huggingface.co/anthracite-org/magnum-v4-27b-gguf/discussions/1

Oh I see, thank you

Oh I see, thank you

You're welcome... I guess we both had the same idea. I really like the new qwen 32b and I wanted to test if I could push it into a refusal but somehow I can't seem to trigger them. I also tried a new finetune on the 32b version (https://huggingface.co/fblgit/TheBeagle-v2beta-32B-MGS) and it actually turned out to be one of the most explicit and morality free models I've ever tried. It even came up with ways to escalate the fictional situation further. Of course that's only my personal experience but I guess there is a way to achieve a qwen32b finetune that is at least as uncensored as the models based on other architectures.

Anthracite org

There were at least 5 full runs on qwen 2.5 32b including on both the base and instruct models, but none of them passed internel testing for release.

The results were just... surprisingly not good compared to the same dataset trained on 72b.

  1. What if you trained it on more data to override the model's bias?
  2. And use a different system template?

I find Qwen has a ton of potential, but it's not quite there for eRP and stories.

@Doctor-Shotgun I know it just came out, but may qwen 2.5 32b coder behave differently then the regular qwen 2.5 32b? Maybe the extra code tokens may have shifted the internal distributions a bit to make it finetune better

Sign up or log in to comment