Best model i've tried (sometimes?)

#1
by saishf - opened

Sometimes this model has really simple but good reasoning.
image.png
Then it goes insane :3
image.png
LM-Studio with the standard ChatML preset

Hopeful for the 32K version, this model is promising but dies at 12k context

image.png

Cognitive Computations org

Very smart and versatile model, great work!
Reacts well to system-prompt variations even when using incorrect prompt-formats.

It will absolutely tell you how to break into a car after briefly letting you know that you really shouldn't, if the car isn't yours.
To me, at first glance, this model strikes the perfect balance between a crippled, moralizing, over-aligned mess and an ice-cold, perpetually horny sociopath.
Almost like any reasonable, smart human I would enjoy a conversation with.

First sub 70B parameter model I have come across that gets the question in the screenshot right, even when changing the wording, names or semantics to avoid concerns it might just be regurgitating training data.

image.png

First sub 70B parameter model I have come across that gets the question in the screenshot right, even when changing the wording, names or semantics to avoid concerns it might just be regurgitating training data.

Do you neutralize samplers and 0 temp for that test? I ran your question through a few L3 8B (including your 2.9, all GGUF Q8_0) models and none had any problem answering correctly, all give similar reasoning, only variation is that they don't write it as a ordered list.

Edit: Further testing... Uh... Funny.. if the L3 model thinks it's ChatGPT (name prefix before response), it answers correctly and clearly. If not, the reasoning is a lot less clear to follow, but it still gets to the answer. edit again: Even works on a few RP models (which should normally be bad when it comes to tests like that), with character and user cards, and they answer it properly. Are you sure your setup is okay?

I think this 9B model is almost at the level of closed source GPT-3.5, which is amazing. The development of large models is truly advancing rapidly.

Cognitive Computations org
β€’
edited May 21, 2024

Edit: Further testing... Uh... Funny.. if the L3 model thinks it's ChatGPT (name prefix before response), it answers correctly and clearly. If not, the reasoning is a lot less clear to follow, but it still gets to the answer. edit again: Even works on a few RP models (which should normally be bad when it comes to tests like that), with character and user cards, and they answer it properly. Are you sure your setup is okay?

I was being quite literal with "First (...) model I have come across ...".
My post was meant to express my appreciation for the work being done here and give some -very surface level- feedback regarding my first impression of this fantastic model.
In no way have I conducted any meaningful, comparative testing with this model.

Worse, the model I interacted with was actually the (quantized) 34B version, honest mistake.

Edit: That being said, both this and the 34B Versions are fantastic, well-rounded models imo.
Your observation that the model seems to reason better when it answers as ChatGPT is really interesting. At a glance that makes a lot of sense, since part of the fine-tuning was done "orca-like" on GPT-generated data if I am not mistaken. That there is a positive effect over using the format the base-model was trained on surprised me a bit though.
It also deals very well even with being given any name it seems, which lead to noticeably worse reasoning in a lot of L2-based models I had messed around with before (unless instructed via system message to pretend/impersonate ofc).

That's fine, I get it, we all get a bit over-excited when a new, promising, model hits HF. πŸ˜‰
And this Yi one is definitely promising.

Your observation that the model seems to reason better when it answers as ChatGPT is really interesting.

It's nothing new, really, manipulating the system prompt is half the battle with small-ish models. I'd assume that making it believe it's ChatGPT pushes the model toward using a formatting closer to whatever part of its dataset had ChatGPT-style content. I don't really know, tho, and I just did that with a single model, before realizing I was running in "character card" mode. I'm not that surprised L2 would perform more poorly, though, it's a lot older, and even fine-tuned to death it can only go so far in prompt understanding.

Cognitive Computations org

Sometimes this model has really simple but good reasoning.
image.png
Then it goes insane :3
image.png
LM-Studio with the standard ChatML preset

I for one am glad you like it. We're very proud of this one.

Crystalcareai changed discussion status to closed

Sign up or log in to comment